WEBVTT

00:00:00.001 --> 00:00:04.120
What do you focus on once you've learned the core concepts of the Python programming language

00:00:04.120 --> 00:00:08.660
and ecosystem? Obviously, knowing a few fundamental packages in your space is critical.

00:00:08.660 --> 00:00:13.820
So if you're a web developer, you should probably know Flask or Pyramid and SQLAlchemy really well.

00:00:13.820 --> 00:00:19.620
If you're a data scientist, import pandas, numpy, matplotlib need to be something you type often

00:00:19.620 --> 00:00:24.740
and intuitively. But then what? Well, I have a few topics for you. This week, you'll meet Mark

00:00:24.740 --> 00:00:29.800
Summerfield, a prolific author of many Python books. We spend our time digging into the ideas

00:00:29.800 --> 00:00:34.520
behind his book, Python in Practice, Create Better Programs Using Concurrency, Libraries,

00:00:34.520 --> 00:00:38.660
and Patterns. What I really like about these topics is that they have a long shelf life.

00:00:38.660 --> 00:00:42.140
You'll find them relevant over time, even as frameworks come and go.

00:00:42.140 --> 00:00:47.960
This is Talk Python To Me, episode 58, recorded May 4th, 2016.

00:00:59.600 --> 00:01:17.220
Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the

00:01:17.220 --> 00:01:21.840
ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter,

00:01:21.900 --> 00:01:26.760
where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm

00:01:26.760 --> 00:01:33.000
and follow the show on Twitter via at Talk Python. This episode is brought to you by Hired and SnapCI.

00:01:33.000 --> 00:01:39.780
Thank them for supporting the show on Twitter via at Hired underscore HQ and at Snap underscore CI.

00:01:41.240 --> 00:01:46.040
Hey, everyone. A couple things to share with you. First, we'll be giving away an electronic copy of Mark's book this week.

00:01:46.040 --> 00:01:51.460
As always, just be registered as a friend of the show on talkpython.fm. I'll pick a winner later in the week.

00:01:51.460 --> 00:01:57.100
Next, I had the honor to spend an hour with Cecil Phillip and Richie Rump on their laid-back,

00:01:57.100 --> 00:01:59.900
technical but casual podcast, Away from the Keyboard.

00:01:59.900 --> 00:02:03.680
I really enjoyed the conversation, and I think you'll like their podcast, too.

00:02:03.680 --> 00:02:06.520
Give them a listen at awayfromthekeyboard.com.

00:02:07.180 --> 00:02:10.720
Finally, those of you taking my Python Jumpstart by Building 10 Apps course,

00:02:10.720 --> 00:02:12.420
I have a few improvements for you.

00:02:12.420 --> 00:02:16.280
I added transcripts to the player, the website, and the GitHub repository,

00:02:16.280 --> 00:02:21.240
as well as added some activity tracking across devices so you know which lectures you've watched.

00:02:21.240 --> 00:02:22.820
I hope you find these additions useful.

00:02:22.820 --> 00:02:24.420
Now, let's chat with Mark.

00:02:24.420 --> 00:02:26.120
Mark, welcome to the show.

00:02:26.120 --> 00:02:28.380
Oh, thank you very much. I'm glad to be on it.

00:02:28.380 --> 00:02:31.500
Yeah, I'm super excited to talk about all these great books you've written,

00:02:31.500 --> 00:02:36.500
and one of them in particular really caught my attention, called Python in Practice,

00:02:36.800 --> 00:02:40.340
Create Better Programs Using Concurrency, Libraries, and Patterns.

00:02:40.340 --> 00:02:45.120
And that just really speaks to me on some of the most important parts of

00:02:45.120 --> 00:02:48.960
sort of design pattern and improving your overall skill,

00:02:48.960 --> 00:02:51.880
not just focused on libraries, like Flask or something.

00:02:51.880 --> 00:02:53.020
Yeah, sure.

00:02:53.020 --> 00:02:56.640
One of the motivations for writing that particular book was

00:02:56.640 --> 00:03:01.460
I wanted to write something for people who were already comfortable writing Python,

00:03:02.140 --> 00:03:06.760
but showing more of the high-level things you could do with Python.

00:03:06.760 --> 00:03:12.600
You know, if you wanted, for example, to do really low-level networking with TCPIP,

00:03:12.600 --> 00:03:14.220
you can do that in Python.

00:03:14.220 --> 00:03:15.420
It's got the libraries.

00:03:15.420 --> 00:03:16.980
All the facilities are there.

00:03:17.400 --> 00:03:20.900
But if what you're more interested in is application programming,

00:03:20.900 --> 00:03:23.220
you might not want to go so low-level.

00:03:23.220 --> 00:03:26.740
And Python, either built-in or in third-party libraries,

00:03:26.740 --> 00:03:30.420
has wonderful facilities for doing high-level stuff,

00:03:30.420 --> 00:03:34.360
whether it's networking, concurrency, and GUIs, and things like that.

00:03:34.560 --> 00:03:38.100
So I wanted to look at some of the facilities that are available,

00:03:38.100 --> 00:03:40.140
both built-in and third-party,

00:03:40.140 --> 00:03:44.160
that allow you to do some fantastic things with Python

00:03:44.160 --> 00:03:47.980
without having to get right down into some nitty-gritty details.

00:03:47.980 --> 00:03:49.360
Yeah, that's beautiful.

00:03:49.360 --> 00:03:52.220
I find that when people are new to Python,

00:03:52.220 --> 00:03:56.840
and this includes myself when I'm working in some area that I haven't worked in a lot,

00:03:56.840 --> 00:04:02.580
I'll not realize there's some really simple thing that I can use.

00:04:02.700 --> 00:04:04.400
And I think it's really great.

00:04:04.400 --> 00:04:06.620
There's a lot of those little tips and tricks in your book.

00:04:06.620 --> 00:04:08.580
But before we get into the details of your book,

00:04:08.580 --> 00:04:09.620
let's start at the beginning.

00:04:09.620 --> 00:04:10.000
Sure.

00:04:10.000 --> 00:04:11.680
How did you get into programming in Python?

00:04:11.680 --> 00:04:17.020
I started programming on paper in the late 70s.

00:04:17.020 --> 00:04:18.820
I started reading computer magazines.

00:04:18.820 --> 00:04:21.740
I taught myself basic purely off the magazines.

00:04:21.740 --> 00:04:25.940
And I wrote my code on paper, and I executed it on paper.

00:04:25.940 --> 00:04:29.000
Then eventually, I bought a home computer.

00:04:29.000 --> 00:04:31.240
I don't know if your listeners will remember what they are,

00:04:31.340 --> 00:04:32.940
but they were things before PCs.

00:04:32.940 --> 00:04:35.940
Very limited, but quite a lot of fun.

00:04:35.940 --> 00:04:42.180
And eventually, I went on to do a computer science degree, which I absolutely loved.

00:04:42.180 --> 00:04:45.460
And that gave me a lot of the sort of theoretical background.

00:04:45.460 --> 00:04:48.340
And then I just went into software development.

00:04:48.760 --> 00:04:54.560
And I'd been doing that for quite a few years when I bumped into someone who suggested trying Python.

00:04:54.560 --> 00:04:59.800
And I borrowed a book from a colleague, a fellow developer on Python.

00:04:59.800 --> 00:05:00.920
And I hated the book.

00:05:00.920 --> 00:05:03.740
And that put me off Python for about a year.

00:05:03.740 --> 00:05:07.860
And that was one of the motivations for writing my first Python book,

00:05:07.860 --> 00:05:10.480
was to write one that would actually encourage people to use it.

00:05:10.540 --> 00:05:18.720
But once I started using it, within a year, all of my utility programs and tools that I use for my daily work,

00:05:18.720 --> 00:05:22.060
they were in Python, because I just loved it.

00:05:22.060 --> 00:05:25.440
So that was around 1999.

00:05:26.140 --> 00:05:30.700
And now, everything I do, the first choice language is always Python.

00:05:30.700 --> 00:05:36.840
Yeah, the Python ecosystem and, frankly, the language is fairly different from the late 90s today.

00:05:36.840 --> 00:05:38.260
Oh, massively different.

00:05:38.260 --> 00:05:41.940
I mean, I actually didn't like the indentation at first.

00:05:41.940 --> 00:05:45.960
That took me like 48 hours before it really clicked.

00:05:45.960 --> 00:05:47.360
Wow, no braces.

00:05:47.360 --> 00:05:48.660
I just don't have to bother.

00:05:48.660 --> 00:05:50.940
And that was really nice.

00:05:50.940 --> 00:05:54.460
And also, of course, it forces your code to be quite neat in the first place.

00:05:54.460 --> 00:05:58.720
It doesn't, of course, make you use good variable names and things like that.

00:05:58.720 --> 00:06:00.120
You have to learn that separately.

00:06:00.880 --> 00:06:02.540
But that applies to any language.

00:06:02.540 --> 00:06:06.520
But I really liked Python pretty well from the get-go.

00:06:06.520 --> 00:06:07.480
Yeah, so did I.

00:06:07.480 --> 00:06:11.080
I think the indentation does catch a lot of people off guard.

00:06:11.080 --> 00:06:14.200
And to me, it's kind of like good science fiction.

00:06:14.200 --> 00:06:18.520
You have to sort of take this moment, the suspension of disbelief.

00:06:18.520 --> 00:06:25.120
Like, just imagine for a minute this white space concept is a good idea and work with it for a week.

00:06:25.120 --> 00:06:28.380
And then it just dawns on you, like, wow, this really is a good idea.

00:06:28.500 --> 00:06:35.900
Like, I went back to working in some C-based languages right after I sort of learned Python and started working in it.

00:06:35.900 --> 00:06:38.920
And the white space was a shock to me at first.

00:06:38.920 --> 00:06:43.120
But what was even a bigger shock was these C-based languages that I loved.

00:06:43.360 --> 00:06:47.680
I all of a sudden hated all the parentheses, the curly braces, the semicolons.

00:06:47.680 --> 00:06:50.500
And that was such a surprise to me that I felt that way.

00:06:50.500 --> 00:06:52.000
But it was within a week.

00:06:52.000 --> 00:06:53.860
It was just completely over the semicolon.

00:06:53.860 --> 00:06:54.460
Sure.

00:06:54.460 --> 00:06:58.360
And of course, there's no dangling else problem that you can get in C or C++.

00:06:59.200 --> 00:07:04.300
You know, if you've got an else with the correct indentation, you know what's going to be executed.

00:07:04.300 --> 00:07:07.960
You're not going to get caught because you hadn't put in braces, you know?

00:07:07.960 --> 00:07:08.600
Yeah, absolutely.

00:07:08.600 --> 00:07:09.180
Yeah.

00:07:09.180 --> 00:07:10.620
It really works for me.

00:07:10.620 --> 00:07:12.120
And Python, the language.

00:07:12.120 --> 00:07:13.120
Okay.

00:07:13.120 --> 00:07:14.020
It's Turing complete.

00:07:14.020 --> 00:07:14.960
And so is Perl.

00:07:14.960 --> 00:07:17.440
And so is C++ and Java and all of these languages.

00:07:17.440 --> 00:07:21.200
So you can do anything in one of them that you can do in the other.

00:07:21.200 --> 00:07:25.180
So why choose one particular language rather than another?

00:07:25.180 --> 00:07:30.200
And I think part of that is, well, what's the libraries and ecosystem like?

00:07:30.200 --> 00:07:33.480
And part of it is what fits the way you think best.

00:07:33.480 --> 00:07:36.460
And in my case, it happened to be Python that works.

00:07:36.460 --> 00:07:41.320
But, you know, I wouldn't argue against someone who preferred some other language if that suited them.

00:07:41.320 --> 00:07:41.960
Yeah, absolutely.

00:07:41.960 --> 00:07:44.300
But for me, Python is a great language.

00:07:44.300 --> 00:07:46.040
And particularly Python 3.

00:07:46.640 --> 00:07:49.120
I started using that from the first alpha.

00:07:49.120 --> 00:07:54.240
I'd ported everything from Python 2 to Python 3 right at that stage.

00:07:54.240 --> 00:07:55.540
And I think it's excellent.

00:07:55.540 --> 00:07:55.900
Yeah.

00:07:55.900 --> 00:07:59.940
You know, maybe that's a good segue to sort of taking a survey of the books that you've written.

00:07:59.940 --> 00:08:01.920
Because you've written many books.

00:08:01.920 --> 00:08:05.460
Is something, how many, around eight, seven?

00:08:05.460 --> 00:08:07.340
Depends whether you count second editions.

00:08:07.340 --> 00:08:09.060
Without second editions, it's seven.

00:08:09.060 --> 00:08:10.780
And with them, it's nine.

00:08:10.780 --> 00:08:11.960
Okay, excellent.

00:08:11.960 --> 00:08:12.760
Yeah.

00:08:12.760 --> 00:08:15.740
And one of them you wrote is a pretty sizable

00:08:15.740 --> 00:08:17.000
Python 3 book, right?

00:08:17.000 --> 00:08:17.560
Yes.

00:08:17.560 --> 00:08:19.760
That's Programming in Python 3.

00:08:19.760 --> 00:08:20.320
Yeah.

00:08:20.320 --> 00:08:24.900
That book is really aimed at people who can program in something or other.

00:08:25.340 --> 00:08:27.880
And it's to port them over to Python 3.

00:08:27.880 --> 00:08:29.700
But it should also work for people.

00:08:29.700 --> 00:08:32.480
And the something or other could be Python 2.

00:08:32.480 --> 00:08:34.300
Right, right.

00:08:34.300 --> 00:08:35.620
So that's who it's aimed for.

00:08:35.620 --> 00:08:36.800
It's not aimed at beginners.

00:08:36.800 --> 00:08:40.560
The subtitle was poorly chosen by me.

00:08:40.560 --> 00:08:43.360
A complete introduction to the Python language.

00:08:43.520 --> 00:08:49.840
Sometimes people think that introduction means it's introductory, which wasn't the intended intention.

00:08:49.840 --> 00:08:56.680
It was just, it's introducing everything that Python 3's got that you're going to use in normal, maintainable programming.

00:08:57.440 --> 00:09:03.280
The only things I don't tend to cover in my books are things that I think are dangerous and obscure.

00:09:03.280 --> 00:09:09.260
So, for example, in Python, you can disassemble Python bytecode, rewrite it, and put it back.

00:09:09.260 --> 00:09:10.260
And that's brilliant.

00:09:10.260 --> 00:09:15.380
But I wouldn't ever cover that in my books because I wanted to cover stuff that people can maintain.

00:09:15.380 --> 00:09:18.080
Understand that it runs.

00:09:18.080 --> 00:09:18.480
That's right.

00:09:18.700 --> 00:09:19.100
Absolutely.

00:09:19.100 --> 00:09:19.800
Absolutely.

00:09:19.800 --> 00:09:27.180
Maintainability and understandability are really important to me because, in my experience, you live with code for quite a long time.

00:09:27.180 --> 00:09:29.140
You know, literally years.

00:09:29.140 --> 00:09:37.400
So you don't want to torture yourself when you have to go back and fix something or do a modification to something.

00:09:37.400 --> 00:09:40.200
And it's been a few years since you've seen that code.

00:09:40.200 --> 00:09:40.840
Yeah.

00:09:40.840 --> 00:09:41.600
Oh, absolutely.

00:09:41.600 --> 00:09:45.020
So maybe tell us some of the other books that you wrote.

00:09:45.020 --> 00:09:45.720
Okay.

00:09:45.720 --> 00:09:47.320
You've got some interesting topics there.

00:09:47.320 --> 00:09:56.860
I think the first one I wrote concerning Python is Rapid GUI Programming with Python and Qt, which is about PyCute programming with PyCute 4.

00:09:56.860 --> 00:10:01.860
Although the book is – I mean, I've updated the examples for PySide as well.

00:10:01.860 --> 00:10:05.660
I really like the Qt GUI toolkit.

00:10:05.660 --> 00:10:08.260
I like cross-platform GUI programming.

00:10:08.260 --> 00:10:15.620
And that book, the first part of it is actually a very brief introduction to Python programming itself.

00:10:15.620 --> 00:10:17.400
And I was quite pleased.

00:10:17.400 --> 00:10:18.820
I had very good feedback on that.

00:10:18.820 --> 00:10:24.380
Many people said, well, I already knew Python, but I read the introduction because, well, I bought the book.

00:10:24.380 --> 00:10:26.160
And I still learned things from it.

00:10:26.160 --> 00:10:27.260
So I was glad about that.

00:10:27.260 --> 00:10:30.180
And Qt itself, I think, is good.

00:10:30.400 --> 00:10:35.960
I know that it's very fashionable, you know, mobile programming and web programming and things like that.

00:10:35.960 --> 00:10:36.760
And they're great.

00:10:36.760 --> 00:10:41.580
But I personally love desktop programming and desktop GUI applications.

00:10:42.020 --> 00:10:46.580
So it was an expression of something that I was really interested in and really enjoyed doing.

00:10:46.580 --> 00:10:47.320
Yeah, that's great.

00:10:47.320 --> 00:10:55.000
I agree with you that desktop apps are really – they don't get enough love, you know, because mobile is a super flashy thing.

00:10:55.240 --> 00:10:57.240
I actually spend most of my time writing web apps.

00:10:57.240 --> 00:11:01.180
But I do very much appreciate a good desktop app.

00:11:01.180 --> 00:11:07.740
And I think Qt is one of those frameworks that is cross-platform but doesn't feel cross-platform as a user.

00:11:07.740 --> 00:11:12.480
You're not like, oh, yeah, this button and this UI completely looks foreign.

00:11:12.700 --> 00:11:14.400
But it technically is a UI, right?

00:11:14.400 --> 00:11:15.120
Absolutely.

00:11:15.120 --> 00:11:15.820
Yeah.

00:11:15.820 --> 00:11:23.100
I mean, it really does – they really do well with native look and feel, even though they're not using native widgets, unlike, say, WXPython, which does.

00:11:23.100 --> 00:11:25.040
They emulate it all.

00:11:25.040 --> 00:11:26.780
But they do it very well.

00:11:26.780 --> 00:11:29.880
And some of their stuff on Windows is faster than native.

00:11:29.880 --> 00:11:30.380
Wow.

00:11:30.380 --> 00:11:32.060
The way that it's implemented.

00:11:32.060 --> 00:11:39.660
Because they're not – when they create a window, they're not creating all those widgets, which you would do using a normal Windows toolkit.

00:11:40.200 --> 00:11:43.160
They just basically create an image.

00:11:43.160 --> 00:11:44.140
Right, right.

00:11:44.140 --> 00:11:47.540
So it's much, much cheaper and much faster in terms of resources.

00:11:47.540 --> 00:11:49.440
So, yeah, I love that toolkit.

00:11:49.440 --> 00:12:00.180
And that's how I'm earning my living now because I'm earning my living writing on the back of commercial Python applications for desktop users.

00:12:00.180 --> 00:12:04.820
So they're written in Python, Python 3, PySidem Qt.

00:12:04.820 --> 00:12:05.600
Oh, wonderful.

00:12:05.600 --> 00:12:11.060
And do you ship those with something like CX Freeze to sort of package them up or was it PyToApp?

00:12:11.060 --> 00:12:12.580
That's exactly right.

00:12:12.580 --> 00:12:14.360
No, I use CX Freeze.

00:12:14.360 --> 00:12:18.920
And so, yeah, they're quite a big download because the Qt libraries are all in the bundle.

00:12:18.920 --> 00:12:26.540
And people can download them, try them, and, you know, hopefully buy them.

00:12:26.540 --> 00:12:28.540
Yeah, wonderful.

00:12:28.800 --> 00:12:31.860
And that seems to have worked for the last couple of years quite nicely.

00:12:31.860 --> 00:12:32.720
Yeah, okay.

00:12:32.720 --> 00:12:33.420
Fantastic.

00:12:33.420 --> 00:12:34.780
Yeah, that's a really cool story.

00:12:34.780 --> 00:12:38.260
It is possible to earn a living with Python not doing web programming.

00:12:38.260 --> 00:12:42.320
I mean, obviously, you can earn good money doing web programming as well.

00:12:42.320 --> 00:12:43.440
Right, of course.

00:12:43.440 --> 00:12:44.260
Or data science.

00:12:44.260 --> 00:12:45.800
Those are the two most common ones.

00:12:46.000 --> 00:12:46.820
Oh, absolutely.

00:12:46.820 --> 00:12:47.340
Yeah.

00:12:47.340 --> 00:12:48.380
Yeah, absolutely.

00:12:48.380 --> 00:12:52.020
So you also have some books on Go and C++.

00:12:52.020 --> 00:12:53.660
Oh, I've got a Go book.

00:12:53.660 --> 00:12:57.060
I did that because I was just really interested in Go.

00:12:57.060 --> 00:13:02.740
I think Go and Rust are both sort of new languages that I am interested in.

00:13:02.740 --> 00:13:05.000
And I like the simplicity of Go.

00:13:05.700 --> 00:13:08.520
And I'm also interested in concurrency.

00:13:08.520 --> 00:13:12.420
And so I started to learn the language.

00:13:12.420 --> 00:13:17.380
And I thought, yeah, I can write something better than what's available on this language.

00:13:17.380 --> 00:13:22.720
But one of the authors of Go has come out with a book on Go now.

00:13:22.720 --> 00:13:25.300
So I should think that will kill mine.

00:13:25.300 --> 00:13:27.720
You never know.

00:13:27.720 --> 00:13:28.440
You never know.

00:13:28.440 --> 00:13:30.620
I mean, well, I still think mine's a good book, actually.

00:13:30.620 --> 00:13:31.640
But there we are.

00:13:32.740 --> 00:13:36.200
In terms of C++, I have written a few.

00:13:36.200 --> 00:13:38.200
But they're all C++ with Q.

00:13:38.200 --> 00:13:46.740
So I co-wrote with Jasmine Blanchett, C++ GUI programming with Q3 and then with Q4.

00:13:46.740 --> 00:13:49.780
And then I did a solo one, advanced Q programming.

00:13:49.780 --> 00:13:50.300
Okay.

00:13:50.300 --> 00:13:52.460
But again, they're all C++.

00:13:52.460 --> 00:13:58.200
I still use them for PiSide programming, you know, to remind myself how to do things.

00:13:58.200 --> 00:14:02.360
And I actually use the C++ docs rather than the PiSide docs.

00:14:02.480 --> 00:14:05.520
So, you know, I can translate easily enough.

00:14:05.520 --> 00:14:07.340
Yeah, they're a little more definitive, maybe.

00:14:07.340 --> 00:14:07.880
Yeah.

00:14:07.880 --> 00:14:10.200
I am frustrated with C++.

00:14:10.200 --> 00:14:13.360
I mean, C++11 was like a huge step forward.

00:14:13.360 --> 00:14:16.740
But they just didn't deprecate enough as far as I'm concerned.

00:14:16.740 --> 00:14:18.060
So it's getting too big.

00:14:18.060 --> 00:14:23.180
And I think it's quite hard to write that language in a maintainable way now.

00:14:23.180 --> 00:14:26.160
Yeah, that language is just ever-growing.

00:14:26.160 --> 00:14:28.160
And it wasn't all that simple in the beginning.

00:14:28.160 --> 00:14:29.040
No.

00:14:29.600 --> 00:14:31.260
I have another question about Qt.

00:14:31.260 --> 00:14:33.120
I definitely want to get to the topics of your books.

00:14:33.120 --> 00:14:33.720
Sure.

00:14:33.720 --> 00:14:34.820
Given your background there.

00:14:34.820 --> 00:14:42.160
I just saw, when was this, back in mid-April, there was an announcement saying we're bringing

00:14:42.160 --> 00:14:44.340
PiSide back to the Qt project.

00:14:44.340 --> 00:14:45.700
That's right.

00:14:45.700 --> 00:14:47.160
I'm really delighted.

00:14:47.160 --> 00:14:48.220
Fantastic.

00:14:48.220 --> 00:14:51.420
It's going to be PiSide for Qt 5.

00:14:51.640 --> 00:14:54.960
Now, you can use PiQt with Qt 5.

00:14:54.960 --> 00:14:57.720
But PiSide has been Qt 4 only so far.

00:14:57.720 --> 00:15:00.980
But they're actually putting money behind it and investing in it.

00:15:00.980 --> 00:15:04.180
So PiSide 2 will be for Qt 5.

00:15:04.180 --> 00:15:06.700
And I'm really looking forward to that.

00:15:06.700 --> 00:15:07.100
Yeah.

00:15:07.100 --> 00:15:07.600
So am I.

00:15:07.600 --> 00:15:10.360
Do you know the time frame on when that kind of stuff will be out?

00:15:10.360 --> 00:15:11.880
They're doing the development in the open.

00:15:11.880 --> 00:15:14.620
I think there's like a GitHub or some equivalent of that.

00:15:14.800 --> 00:15:16.740
So you can look at it.

00:15:16.740 --> 00:15:22.260
But I would guess it's going to be, I think we'd be lucky to see by the end of the year

00:15:22.260 --> 00:15:23.320
something that's usable.

00:15:23.320 --> 00:15:24.420
Right.

00:15:24.420 --> 00:15:26.280
Because it's not a simple job.

00:15:26.280 --> 00:15:26.560
Yeah.

00:15:26.560 --> 00:15:28.640
There's a pretty big break from Qt 4 to Qt 5.

00:15:28.640 --> 00:15:30.880
Not so much the Qt side of it.

00:15:30.880 --> 00:15:32.460
I don't think that's the hard side.

00:15:32.460 --> 00:15:37.020
As you know, programmers, we love reinventing things.

00:15:37.020 --> 00:15:42.000
And when they did PiSide, they invented a new way of doing bindings.

00:15:42.380 --> 00:15:48.260
And I think that hasn't proven to be quite as maintainable and flexible as they'd hoped.

00:15:48.260 --> 00:15:51.920
So I think that's where they're going to have to do quite a lot of work, getting that to

00:15:51.920 --> 00:15:54.320
work with Qt 5 and the new C++.

00:15:54.320 --> 00:15:54.980
Right.

00:15:54.980 --> 00:15:55.380
Okay.

00:15:55.380 --> 00:15:55.780
Yeah.

00:15:55.780 --> 00:15:59.520
So it's more with the PiSide version than it is the Qt itself.

00:15:59.520 --> 00:16:00.020
Got it.

00:16:00.020 --> 00:16:01.140
I think so.

00:16:01.140 --> 00:16:02.080
I think so.

00:16:02.080 --> 00:16:02.480
Yeah.

00:16:02.480 --> 00:16:02.880
All right.

00:16:02.880 --> 00:16:03.160
Excellent.

00:16:03.160 --> 00:16:09.340
So I know there's a lot of interest in GUI apps from a Python perspective.

00:16:09.340 --> 00:16:12.280
And maybe another time we can dig into Qt even more.

00:16:12.400 --> 00:16:15.240
But let's talk a little bit about creating better Python apps.

00:16:15.240 --> 00:16:15.760
Okay.

00:16:15.760 --> 00:16:18.300
What was the motivation for writing this book?

00:16:18.300 --> 00:16:23.040
I mean, you said you aimed it at people who were in the middle, but you gave it four themes.

00:16:23.040 --> 00:16:28.300
You said, I'm going to cover sort of these general themes of code elegance, improving speed

00:16:28.300 --> 00:16:30.720
with concurrency, networking and graphics.

00:16:30.720 --> 00:16:32.900
How did you come to that collection?

00:16:32.900 --> 00:16:35.780
Well, graphics, because I just love GUI programming.

00:16:35.780 --> 00:16:38.740
So that had to go in because it's something I love.

00:16:38.740 --> 00:16:44.460
I also, with networking, I've done a fair bit of network programming, but I'm not a low-level

00:16:44.460 --> 00:16:45.040
person.

00:16:45.040 --> 00:16:48.620
I like my networking to be as easy as possible.

00:16:49.380 --> 00:16:54.900
And I wanted people to be aware that you can do networking really easily with Python without

00:16:54.900 --> 00:17:00.420
having to go down, you know, to low-level stuff and do it in a reliable and pleasant way.

00:17:00.980 --> 00:17:03.580
And I cover two approaches.

00:17:03.580 --> 00:17:04.980
One is XMLRPC.

00:17:04.980 --> 00:17:08.400
And I cover that because it's built, it's in the standard library.

00:17:08.400 --> 00:17:11.060
And it works really easily.

00:17:11.060 --> 00:17:13.560
So it's really nice.

00:17:13.760 --> 00:17:17.320
And the other one I cover is a third-party one called RPYC.

00:17:17.320 --> 00:17:23.180
There's another one called Pyro, which is also widely used.

00:17:23.180 --> 00:17:25.120
And I could have gone with either of those.

00:17:25.120 --> 00:17:30.960
And the advantage of RPYC or Pyro is that they can be Python-specific.

00:17:30.960 --> 00:17:35.460
So you can get better performance, whereas XMLRPC is general.

00:17:35.460 --> 00:17:41.000
So it's not got quite as good the performance, but it has the advantage that you can write a

00:17:41.000 --> 00:17:44.460
client and or server using XMLRPC.

00:17:44.460 --> 00:17:49.860
And it'll talk to anything else that uses the XMLRPC protocol.

00:17:49.860 --> 00:17:52.820
So that's very nice for interoperability.

00:17:52.820 --> 00:17:54.080
And they're high-level.

00:17:54.080 --> 00:18:00.240
So all of the detail and, you know, timeouts and all of the issues that can arise in networking

00:18:00.240 --> 00:18:02.460
is just neatly controlled and wrapped up.

00:18:02.460 --> 00:18:04.940
And, OK, you get exceptions and things like that.

00:18:04.940 --> 00:18:07.040
You know, all normal Python stuff.

00:18:07.040 --> 00:18:08.900
So you don't have to worry about the details.

00:18:08.900 --> 00:18:10.040
Yeah, it makes a lot of sense.

00:18:10.040 --> 00:18:14.500
So I want people to be aware of that, that these kind of facilities exist.

00:18:14.500 --> 00:18:20.300
You can basically write a wide range of different types of networking apps in Python, right?

00:18:20.300 --> 00:18:23.120
You can go all the way down to the raw sockets with Byte.

00:18:23.120 --> 00:18:26.320
I just talked with Mahmoud Hashemi from PayPal.

00:18:26.320 --> 00:18:30.920
And those guys are writing services that take over a billion requests a day.

00:18:30.920 --> 00:18:31.480
Wow.

00:18:31.620 --> 00:18:36.040
They're doing network programming in Python, but down below the HTTP level.

00:18:36.040 --> 00:18:36.520
Wow.

00:18:36.760 --> 00:18:38.000
And these custom APIs.

00:18:38.000 --> 00:18:40.640
And then you can, of course, go up higher, right?

00:18:40.640 --> 00:18:45.580
Like XML RPC or maybe REST service with requests, things like that, right?

00:18:45.580 --> 00:18:46.600
Absolutely.

00:18:47.020 --> 00:18:49.380
And it's the high level stuff that I was more interested in.

00:18:49.380 --> 00:18:53.660
And I think that's because at heart, I'm an applications programmer.

00:18:54.160 --> 00:19:00.660
And that means I know about the subjects of my application, but I don't necessarily have the

00:19:00.660 --> 00:19:05.140
expertise in particular areas that the application needs.

00:19:05.140 --> 00:19:10.380
And so for that, I want high level libraries that give me the functionality that are created

00:19:10.380 --> 00:19:11.820
by experts in those fields.

00:19:11.820 --> 00:19:13.400
So I get the best of both worlds.

00:19:13.400 --> 00:19:19.280
I get the functionality I need by, you know, excellent people who've developed it without

00:19:19.280 --> 00:19:21.420
actually having to learn all that stuff myself.

00:19:21.420 --> 00:19:22.340
Yeah, absolutely.

00:19:22.340 --> 00:19:27.680
I think the right way to start is start simple and then, you know, then go do crazy network

00:19:27.680 --> 00:19:29.620
stuff if you need to improve the performance.

00:19:29.620 --> 00:19:32.520
But generally, you don't have a performance problem.

00:19:32.520 --> 00:19:33.840
No, no.

00:19:33.840 --> 00:19:36.920
And that actually brings us nicely to concurrency.

00:19:36.920 --> 00:19:40.840
Python, you know, people say, oh, is Python slower?

00:19:40.840 --> 00:19:42.260
Python can't do concurrency.

00:19:42.260 --> 00:19:47.320
And I really wanted to address those issues because how slow is Python really?

00:19:47.320 --> 00:19:52.720
Well, I developed a program in C++ that was very CPU intensive.

00:19:52.720 --> 00:19:57.980
And I rewrote that program in Python and it was 50% slower.

00:19:57.980 --> 00:20:04.780
And I think that's not bad going from C++ to an interpreted language, a bytecode interpreted

00:20:04.780 --> 00:20:05.320
language.

00:20:05.320 --> 00:20:08.120
But of course, I then made it concurrent.

00:20:08.120 --> 00:20:12.120
And you could make it concurrent in C++, but it's much easier.

00:20:12.120 --> 00:20:13.160
They're doing that in Python.

00:20:13.160 --> 00:20:19.120
And so on a dual core machine, suddenly it was as fast as C++.

00:20:19.120 --> 00:20:20.420
Give it more cores.

00:20:20.420 --> 00:20:22.120
And it was faster than the C++.

00:20:22.120 --> 00:20:29.860
So even though baseline, yeah, it's 50% slower on real hardware using concurrency, it was much

00:20:29.860 --> 00:20:30.420
faster.

00:20:30.680 --> 00:20:31.680
And that's really what the user is going to care about.

00:20:31.680 --> 00:20:33.420
And that's really what the user is going to care about.

00:20:33.420 --> 00:20:35.860
You know, on my hardware is this thing running fast.

00:20:35.860 --> 00:20:36.520
Right.

00:20:36.520 --> 00:20:38.660
And of course, it's much more maintainable.

00:20:38.660 --> 00:20:42.980
I mean, doing concurrency in Python is so much easier than in most other languages.

00:20:43.380 --> 00:20:43.500
Yeah.

00:20:43.500 --> 00:20:50.160
And I think specifically around concurrency, it's easy to get yourself into a situation where

00:20:50.160 --> 00:20:55.420
you've been very clever and you've thought really hard about the algorithms and the way

00:20:55.420 --> 00:20:55.900
it works.

00:20:56.220 --> 00:21:00.740
And you've kind of written something just at the limit of your understanding.

00:21:00.740 --> 00:21:04.020
Like you totally understood what you did, but it's at the very edge.

00:21:04.020 --> 00:21:09.040
But of course, understanding multithreaded code, debugging it is harder than writing it.

00:21:09.040 --> 00:21:13.460
And so maybe it's like, you know, you've sort of gone a little too far.

00:21:13.460 --> 00:21:17.360
You're like, okay, I could write this, but I don't really understand how to fix it when it

00:21:17.360 --> 00:21:17.860
goes wrong.

00:21:17.860 --> 00:21:18.260
Yeah.

00:21:18.260 --> 00:21:20.740
And that is a real problem with concurrency.

00:21:20.740 --> 00:21:27.740
In some languages, you're stuck because the concurrency facilities, they offer pretty basic.

00:21:27.740 --> 00:21:29.580
So they don't make it easy.

00:21:29.580 --> 00:21:34.840
But Python offers higher level concurrent approaches to concurrency as well as low level.

00:21:34.840 --> 00:21:42.100
For example, you've got the concurrent futures module, which makes it much easier to create

00:21:42.100 --> 00:21:47.800
either separate threads or separate processes where Python takes care of lots of the low level

00:21:47.800 --> 00:21:48.440
details.

00:22:17.740 --> 00:22:20.400
Within the first week, and there are no obligations ever.

00:22:20.400 --> 00:22:21.980
Sounds awesome, doesn't it?

00:22:21.980 --> 00:22:23.640
Well, did I mention the signing bonus?

00:22:23.640 --> 00:22:27.040
Everyone who accepts a job from Hired gets a $1,000 signing bonus.

00:22:27.040 --> 00:22:29.800
And as Talk Python listeners, it gets way sweeter.

00:22:29.800 --> 00:22:35.520
Use the link Hired.com slash Talk Python To Me and Hired will double the signing bonus to $2,000.

00:22:35.520 --> 00:22:37.660
Opportunity's knocking.

00:22:37.660 --> 00:22:41.100
Visit Hired.com slash Talk Python To Me and answer the call.

00:22:45.500 --> 00:22:50.880
I think you really put this together quite nicely in terms of sort of breaking out the different

00:22:50.880 --> 00:22:51.920
types of concurrency.

00:22:52.740 --> 00:22:56.920
And it helps you understand if you sort of think, okay, well, what are the types of concurrency?

00:22:56.920 --> 00:22:58.760
What type of problem am I solving?

00:22:58.760 --> 00:23:03.900
Then you have a pretty good recommendation for if it's this type of problem, solve it this

00:23:03.900 --> 00:23:04.100
way.

00:23:04.100 --> 00:23:05.680
And so you said there were three types.

00:23:05.820 --> 00:23:10.040
You called them threaded concurrency, process-based concurrency, and then concurrent weighting.

00:23:10.040 --> 00:23:11.100
Yeah, yeah.

00:23:11.100 --> 00:23:17.940
Basically, if you're doing CPU-intensive work, then using threading in Python is not going to

00:23:17.940 --> 00:23:21.020
help because of the global interpreter lock.

00:23:21.020 --> 00:23:26.820
So if it's CPU-intensive and you need concurrency, then you need to use a different method.

00:23:26.820 --> 00:23:32.160
And Python offers, for example, multi-processing, where you can split your work over multiple

00:23:32.160 --> 00:23:34.340
processes rather than multiple threads.

00:23:34.340 --> 00:23:40.980
And each of those has its own separate interpreter lock so that they don't interfere with each

00:23:40.980 --> 00:23:41.240
other.

00:23:41.240 --> 00:23:41.820
Right.

00:23:41.820 --> 00:23:49.220
But I think the key to getting real performance from concurrency is to avoid sharing insofar as

00:23:49.220 --> 00:23:49.760
you can.

00:23:50.540 --> 00:23:56.140
And that means either you don't need to share in the first place, or if you've got data that

00:23:56.140 --> 00:24:01.460
needs to be looked at by your multiple threads or multiple processes, then it may be cheaper

00:24:01.460 --> 00:24:07.560
to copy that data rather than have them share some kind of locking to look at it.

00:24:07.560 --> 00:24:07.900
Right.

00:24:07.900 --> 00:24:13.500
And that's a mistake I think a lot of people make is they see the way that their program is

00:24:13.500 --> 00:24:14.100
working now.

00:24:14.100 --> 00:24:15.440
They've got some shared...

00:24:15.440 --> 00:24:18.960
They've got a pointer they're passing to two methods or whatever, two parts of the

00:24:18.960 --> 00:24:19.280
algorithm.

00:24:19.280 --> 00:24:22.820
And they're saying, well, this part's going to work on this part of the memory and this

00:24:22.820 --> 00:24:23.700
one's going to work over here.

00:24:23.700 --> 00:24:28.140
And so they think, well, when I parallelize this, this is shared memory access.

00:24:28.140 --> 00:24:33.140
And of course, you have to take a lock or somehow serialize access to that data.

00:24:33.720 --> 00:24:39.480
And it's easy to forget that, you know, if that's like a meg or even maybe 50 megs of

00:24:39.480 --> 00:24:45.380
data, it might be so much easier both for you and advantageous for performance to just say

00:24:45.380 --> 00:24:45.780
copy.

00:24:45.780 --> 00:24:49.640
Make a copy and just pass it over and then correlate the differences later.

00:24:49.640 --> 00:24:50.620
Absolutely.

00:24:50.620 --> 00:24:56.920
The other possibility is to share the data without copying, which is fine if you never write to

00:24:56.920 --> 00:24:57.120
it.

00:24:57.560 --> 00:25:02.580
So if you've got data where you're just reading in information like a log or some data

00:25:02.580 --> 00:25:08.460
stream, and you're never changing the information you're reading, you might be producing new output,

00:25:08.460 --> 00:25:09.260
but that's separate.

00:25:09.260 --> 00:25:14.160
So if the stuff you're reading, you can read that from a shared data structure as long as

00:25:14.160 --> 00:25:14.580
you read.

00:25:14.580 --> 00:25:16.160
And that's not going to be a problem.

00:25:16.160 --> 00:25:21.100
It's only when you're going to start writing that sharing becomes an issue.

00:25:21.100 --> 00:25:24.560
And then you've got problems if you don't lock.

00:25:24.920 --> 00:25:27.360
But the best way is still don't lock.

00:25:27.360 --> 00:25:33.700
The best way if you're writing data is write or save the data in separate chunks and gather

00:25:33.700 --> 00:25:34.620
it together at the end.

00:25:34.620 --> 00:25:38.340
That will often be less error prone and faster.

00:25:38.340 --> 00:25:39.520
Yeah, absolutely.

00:25:39.520 --> 00:25:41.500
But of course, sometimes there isn't a choice.

00:25:41.500 --> 00:25:43.340
Sometimes you do need to lock.

00:25:43.340 --> 00:25:51.200
And then that's when it becomes quite difficult to reason about because you've got to be clear

00:25:51.200 --> 00:25:55.400
when you need to lock and when you need to unlock and all the rest of it.

00:25:55.400 --> 00:25:57.820
And that's where the difficulty comes in.

00:25:57.820 --> 00:26:03.760
But if you can avoid having to lock, then you can get good performance without problems.

00:26:03.760 --> 00:26:05.160
Absolutely.

00:26:06.460 --> 00:26:13.760
And there's some of the data structures in the newer versions of Python that sort of solve

00:26:13.760 --> 00:26:14.960
that problem for you.

00:26:14.960 --> 00:26:16.480
And so we'll talk those a little bit.

00:26:16.480 --> 00:26:22.220
But when you said copy data, one way to say copy a data structure, like obviously you can't

00:26:22.220 --> 00:26:27.160
just pass the pointer over or get another variable and point to the same thing, right?

00:26:27.160 --> 00:26:30.780
Because it's a pass by reference, not pass by value type of semantics.

00:26:30.780 --> 00:26:34.900
So there's actually a copy module in the standard library, right?

00:26:34.900 --> 00:26:35.440
Yeah.

00:26:35.440 --> 00:26:38.760
And basically, what I would contend is this.

00:26:38.760 --> 00:26:46.620
If you use copy.deepcopy in a non-concurrent program, then there's almost certainly something

00:26:46.620 --> 00:26:47.720
wrong with your logic.

00:26:47.720 --> 00:26:54.380
But if you're using a concurrent programming, then it may well be the right solution for you.

00:26:54.380 --> 00:26:59.600
Because deep copying can be expensive if you're using like nested data structures, like dictionaries

00:26:59.600 --> 00:27:02.000
with dictionaries and things that are quite large, for example.

00:27:02.000 --> 00:27:05.280
But nonetheless, it may be the right tradeoff.

00:27:05.280 --> 00:27:10.620
Of course, the only way you're going to know for sure is to profile and actually time things.

00:27:10.620 --> 00:27:13.280
Because that's the other sort of big issue, isn't it?

00:27:13.280 --> 00:27:17.220
That, you know, we have an intuitive feeling this will be fast or that will be slow.

00:27:17.220 --> 00:27:22.480
But unless you back it up with numbers, you could be optimizing something that makes no

00:27:22.480 --> 00:27:23.500
difference whatsoever.

00:27:24.280 --> 00:27:29.800
Yeah, you had some really interesting points there that I thought were both interesting and

00:27:29.800 --> 00:27:30.280
good advice.

00:27:30.280 --> 00:27:37.580
One was, if you're going to write a concurrent program, write the non-concurrent serial single

00:27:37.580 --> 00:27:38.480
threaded version.

00:27:38.480 --> 00:27:39.160
Absolutely.

00:27:39.880 --> 00:27:44.060
first, if you can, and then use that as the baseline for your future work, right?

00:27:44.060 --> 00:27:49.320
I mean, one of my commercial programs, it does its work using concurrency.

00:27:49.320 --> 00:27:50.960
It's written in Python and it's concurrent.

00:27:50.960 --> 00:27:54.880
But I have two modules that do the processing.

00:27:55.040 --> 00:27:58.120
And one uses concurrency and one doesn't.

00:27:58.120 --> 00:28:03.160
And the tests, I have to make sure they produce exactly the same results.

00:28:03.160 --> 00:28:05.340
And of course, one is much slower than the other.

00:28:05.340 --> 00:28:10.340
But I found that incredibly useful in the early days, particularly for debugging.

00:28:10.340 --> 00:28:15.700
And I still use the non-concurrent one if there's some tricky area that I want to focus

00:28:15.700 --> 00:28:17.660
on without having to worry about concurrency.

00:28:17.660 --> 00:28:22.760
So I found it's paid off in terms of saving my time as a programmer.

00:28:22.760 --> 00:28:24.580
And that's the other kind of time, isn't it?

00:28:24.780 --> 00:28:28.620
It's not just the processing or runtime of your software.

00:28:28.620 --> 00:28:34.200
It's the time you spend not just creating this stuff, but maintaining it.

00:28:34.200 --> 00:28:39.160
And concurrency can cost you a lot of maintenance.

00:28:39.160 --> 00:28:40.080
Oh, yes.

00:28:40.080 --> 00:28:41.480
Unless you're very careful.

00:28:41.480 --> 00:28:43.380
Yeah, I'm a strong advocate.

00:28:43.380 --> 00:28:45.980
Get the non-concurrent version working first.

00:28:45.980 --> 00:28:50.800
And of course, it may turn out that that's actually fast enough anyway.

00:28:50.800 --> 00:28:51.360
Absolutely.

00:28:51.360 --> 00:28:53.900
And then you've saved yourself a whole lot of trouble.

00:28:54.040 --> 00:28:57.160
It's sort of the whole premature optimization issues.

00:28:57.160 --> 00:28:57.620
Yeah.

00:28:57.620 --> 00:28:57.860
Yeah.

00:28:57.860 --> 00:29:03.460
It's easy to get seduced into doing things concurrently because it's very fashionable and you can brag

00:29:03.460 --> 00:29:04.120
about it.

00:29:04.540 --> 00:29:07.460
But quite honestly, it's got to be the right solution.

00:29:07.460 --> 00:29:11.420
And you're not going to know that until you've done a non-concurrent one first, I think.

00:29:11.420 --> 00:29:12.700
Yeah, that's a good point.

00:29:12.700 --> 00:29:15.920
And sort of related to that is the performance story.

00:29:16.460 --> 00:29:22.040
So you have some examples in your book where you write the concurrent version and then you have

00:29:22.040 --> 00:29:24.920
the concurrent version, but you write it in several ways.

00:29:24.920 --> 00:29:27.780
And you also have the serial version.

00:29:27.780 --> 00:29:28.560
That's right.

00:29:28.560 --> 00:29:34.500
Just to compare to see, to show what difference different strategies make depending on circumstances.

00:29:34.500 --> 00:29:35.100
Right.

00:29:35.100 --> 00:29:39.280
And for the CPU based one, you were doing like image analysis and processing.

00:29:39.620 --> 00:29:41.720
Yeah, because that's expensive in terms of CPU.

00:29:41.720 --> 00:29:44.580
And of course, if you use threading, it kills performance.

00:29:44.580 --> 00:29:45.200
Yeah.

00:29:45.200 --> 00:29:51.780
Ironically, in CPython, it's actually several times slower if you do it in parallel.

00:29:52.240 --> 00:29:58.200
Yeah, because it's only actually running on a single call at a time and you've got context

00:29:58.200 --> 00:29:59.160
switching on top of it.

00:29:59.160 --> 00:30:03.040
Whereas if you use multi-processing, Zoom, it can run free.

00:30:03.040 --> 00:30:07.320
You know, it'll max out your calls and it will go as fast as your machine's capable of.

00:30:07.320 --> 00:30:13.660
The performance story around threading is super hard to see the whole picture because we obviously

00:30:13.660 --> 00:30:18.860
know when you take a lock here, that slows down both the threads and the context switching

00:30:18.860 --> 00:30:19.300
is slow.

00:30:19.300 --> 00:30:22.340
But then you also have the sort of memory usage.

00:30:22.340 --> 00:30:25.780
You have the L1, L2 caches and registers.

00:30:25.780 --> 00:30:31.180
And so like when you switch from one thread to the other, it could pull in data that trashes

00:30:31.180 --> 00:30:34.960
your cache and you've got to go get it from main memory, which is like 100 times slower.

00:30:34.960 --> 00:30:37.020
And it's just, it's very subtle.

00:30:37.020 --> 00:30:44.200
And yeah, if you're networking, then generally using threads is fine because the network latency

00:30:44.200 --> 00:30:45.540
is so dominant.

00:30:46.020 --> 00:30:51.940
I'm not talking about in terms of if you're like Google and doing like massive servers,

00:30:51.940 --> 00:30:58.420
but for a lot of, if you like more ordinary applications, then threading is fine for that.

00:30:58.420 --> 00:31:00.820
But of course there is the new asyncio library.

00:31:01.400 --> 00:31:03.820
The GIL is basically one of the problems.

00:31:03.820 --> 00:31:07.580
It means you don't get any of the concurrency that you're aiming for computationally, but

00:31:07.580 --> 00:31:08.720
you still get all the overhead.

00:31:08.720 --> 00:31:11.060
Well, you don't get it at the Python level.

00:31:11.060 --> 00:31:12.560
You will get it at the C level.

00:31:12.560 --> 00:31:19.260
So if you have something that's running with Python threads and actually the work is being

00:31:19.260 --> 00:31:26.400
done by say a C library, if that C library is written well, it'll release the GIL, do its

00:31:26.400 --> 00:31:29.580
work and then reacquire it when it needs to pass data back.

00:31:29.580 --> 00:31:33.720
So it's, it's, it's not a simple story, no matter how you look at it.

00:31:33.720 --> 00:31:34.060
Yeah.

00:31:34.060 --> 00:31:34.360
Okay.

00:31:34.460 --> 00:31:38.760
So you still have, you've got a performance test, but you've got to have that serial version

00:31:38.760 --> 00:31:43.420
to give you a benchmark so that you know whether you're getting better or worse.

00:31:43.420 --> 00:31:43.680
Right.

00:31:43.680 --> 00:31:44.100
Absolutely.

00:31:44.100 --> 00:31:49.080
And also just doing that serial one, it will give you insights into the problem you're solving

00:31:49.080 --> 00:31:49.500
anyway.

00:31:49.500 --> 00:31:53.960
And it's better to make mistakes with the serial one than with the concurrent one because you've

00:31:53.960 --> 00:31:55.440
got less to think about then.

00:31:55.440 --> 00:31:56.360
Yeah, that's for sure.

00:31:56.360 --> 00:32:01.920
If you really do have a problem and it is slower, the trick is to use sub processes.

00:32:01.920 --> 00:32:09.020
Yeah, use the multi-processing module, which prior to 3.2, I found not terribly reliable,

00:32:09.020 --> 00:32:15.840
but they did loads of improvements in 3.2 and certainly in 3.3 and 4.

00:32:15.840 --> 00:32:20.820
It's absolutely rock solid, both Windows and Linux, which are the ones, the platforms I use.

00:32:20.820 --> 00:32:22.320
And it's brilliant.

00:32:22.320 --> 00:32:24.580
It's absolutely an excellent library.

00:32:24.580 --> 00:32:25.000
Yeah.

00:32:25.000 --> 00:32:28.720
And I just want to point out to people, like when you hear say use sub processes, that doesn't

00:32:28.720 --> 00:32:31.900
mean just go kick off a bunch of processes and manage it yourself, right?

00:32:31.900 --> 00:32:35.460
There's sort of a concurrent library for managing them, right?

00:32:35.460 --> 00:32:37.480
Multi-processing library.

00:32:37.480 --> 00:32:38.880
Oh, the multi- yeah.

00:32:38.880 --> 00:32:40.920
I mean, there's nothing to stop you.

00:32:40.920 --> 00:32:42.320
I mean, there is the sub process module.

00:32:42.320 --> 00:32:45.960
You can do it all manually, but there's no reason to do that.

00:32:45.960 --> 00:32:48.160
I mean, you can create a process pool.

00:32:48.160 --> 00:32:50.760
In one of my applications, I do that.

00:32:50.760 --> 00:32:53.360
And there's an asynchronous function.

00:32:53.480 --> 00:32:58.660
You can basically just give it a Python function and some arguments and say, okay, go do this

00:32:58.660 --> 00:33:00.560
somewhere else on some other processor.

00:33:00.560 --> 00:33:02.900
And it'll just do that work.

00:33:02.900 --> 00:33:09.100
If that's like expensive work, that's great because it doesn't stop your main process at

00:33:09.100 --> 00:33:09.400
all.

00:33:09.400 --> 00:33:12.620
And when it's done the work, it lets you know.

00:33:12.940 --> 00:33:14.360
And you can pick up from there.

00:33:14.360 --> 00:33:15.120
Right.

00:33:15.120 --> 00:33:21.580
So there's the concurrent futures module, which can be, which is a very high level module,

00:33:21.580 --> 00:33:27.780
which makes it really easy to just execute either with threads or processes stuff.

00:33:27.780 --> 00:33:32.600
Or you can go use the multi-processing module itself with its pools and stuff.

00:33:32.680 --> 00:33:35.880
So you can find the level that suits what you need.

00:33:35.880 --> 00:33:36.400
Yeah.

00:33:36.400 --> 00:33:42.400
It feels to me like if you're doing Python 3.2 or above, you should really consider maybe

00:33:42.400 --> 00:33:47.540
the concurrent module first and the concurrent futures, because it's so easy to say, let's

00:33:47.540 --> 00:33:48.620
do this computationally.

00:33:48.620 --> 00:33:49.560
Let's do this.

00:33:49.560 --> 00:33:49.860
Yeah.

00:33:49.860 --> 00:33:51.480
Let's do this as sub processes.

00:33:51.480 --> 00:33:54.960
Let's switch it to have like a pool of sub processes.

00:33:54.960 --> 00:33:56.180
All of those things.

00:33:56.180 --> 00:33:56.400
Right.

00:33:56.400 --> 00:33:56.860
Yeah.

00:33:56.860 --> 00:34:01.440
The other thing about multi-processing is by default, it doesn't share memory, which is

00:34:01.440 --> 00:34:02.560
the opposite of threading.

00:34:02.560 --> 00:34:08.640
Which means you're not going to clobber yourself by writing to something that's shared.

00:34:08.640 --> 00:34:13.380
Of course, it means if you do want to share, you have to actually go to extra effort and

00:34:13.380 --> 00:34:15.700
say, OK, I'm setting up this thing to be shared.

00:34:30.920 --> 00:34:33.000
Continuous delivery isn't just a buzzword.

00:34:33.000 --> 00:34:36.640
It's a shift in productivity that will help your whole team become more efficient.

00:34:36.640 --> 00:34:42.160
With SnapCI's continuous delivery tool, you can test, debug, and deploy your code quickly

00:34:42.160 --> 00:34:42.820
and reliably.

00:34:42.820 --> 00:34:48.080
Get your product in the hands of your users faster and deploy from just about anywhere at

00:34:48.080 --> 00:34:48.700
any time.

00:34:48.700 --> 00:34:53.160
And did you know that ThoughtWorks literally wrote the book on continuous integration and

00:34:53.160 --> 00:34:53.920
continuous delivery?

00:34:54.740 --> 00:34:59.080
Connect Snap to your GitHub repo and they'll build and run your first pipeline automagically.

00:34:59.080 --> 00:35:05.680
Thanks SnapCI for sponsoring this episode by trying them for free at snap.ci slash talkpython.

00:35:13.460 --> 00:35:17.200
Yeah, there's some built-in data structures for sharing across process, right?

00:35:17.200 --> 00:35:17.920
There are.

00:35:17.920 --> 00:35:20.820
I mean, I only use them personally for flags.

00:35:20.820 --> 00:35:27.260
You know, I tend to gather my data separately if I've got results data and then join it all

00:35:27.260 --> 00:35:28.540
up together at the end.

00:35:28.540 --> 00:35:33.880
So I just use flags basically to say, like, this bit's at this stage or a flag to say, look,

00:35:33.880 --> 00:35:36.900
just stop now because the user's cancelled.

00:35:36.900 --> 00:35:37.420
Yeah.

00:35:37.420 --> 00:35:39.380
To give me a clean termination.

00:35:39.960 --> 00:35:42.740
But, you know, I mean, I cover all that sort of stuff in the book.

00:35:42.740 --> 00:35:46.240
But it's a really great module, multiprocessing.

00:35:46.240 --> 00:35:52.440
But concurrent futures gives you that high-level approach, which makes it as simple as it can

00:35:52.440 --> 00:35:54.180
be for this kind of stuff.

00:35:54.180 --> 00:36:01.320
I mean, I'd still advocate not using concurrency unless you need it, you know, because it does

00:36:01.320 --> 00:36:06.040
make your program more complicated and harder to reason with, you know, or reason about.

00:36:06.040 --> 00:36:06.480
Yeah.

00:36:06.480 --> 00:36:12.560
And, you know, it is easy to switch between concurrent futures using subprocesses or using

00:36:12.560 --> 00:36:13.020
threads.

00:36:13.020 --> 00:36:17.640
But that doesn't mean the code that you write can be just flipped from one to the other because

00:36:17.640 --> 00:36:21.420
of the serialization issues and all, you know, the locking shared data.

00:36:21.420 --> 00:36:24.680
So that's maybe a really subtle thing you could run into.

00:36:24.880 --> 00:36:30.400
Well, if you're doing threading, the memory is shared by default.

00:36:30.400 --> 00:36:35.620
So any thread can stomp on anything, which can be a problem.

00:36:35.620 --> 00:36:41.100
But on the other hand, if you're using multiprocessing, any data that you're passing around has to

00:36:41.100 --> 00:36:47.020
be picklable, for example, which doesn't apply as a limit in threading because you're just

00:36:47.020 --> 00:36:49.440
accessing the same data in the same memory space.

00:36:49.440 --> 00:36:51.900
So there are differences and there are tradeoffs.

00:36:51.900 --> 00:36:59.640
And the API for multiprocessing started as mimicking the threading API, but it's actually

00:36:59.640 --> 00:37:01.180
grown considerably since then.

00:37:01.180 --> 00:37:04.740
So it's worth digging in and learning.

00:37:04.740 --> 00:37:10.120
But I start with the concurrent futures because that is the easiest conceptually and impractical

00:37:10.120 --> 00:37:10.480
code.

00:37:10.480 --> 00:37:12.580
It requires the least code to get stuff done.

00:37:12.580 --> 00:37:13.360
Yeah, absolutely.

00:37:13.960 --> 00:37:21.100
So both the multiprocessing and the threading are pretty good for when you're doing basic

00:37:21.100 --> 00:37:22.700
IO bound work, right?

00:37:22.700 --> 00:37:27.700
Because the key thing to know about that is a thread when it waits on a network call in

00:37:27.700 --> 00:37:29.600
CPython will release the gill, right?

00:37:29.600 --> 00:37:30.180
Yeah.

00:37:30.180 --> 00:37:37.480
But of course, there is the asyncio module, which is designed for that kind of work.

00:37:37.480 --> 00:37:42.840
I'm not a user of that module because most of my processing is CPU bound.

00:37:43.140 --> 00:37:43.440
Right.

00:37:43.440 --> 00:37:45.720
But that is a third way, if you like.

00:37:45.720 --> 00:37:46.000
Yeah.

00:37:46.000 --> 00:37:51.560
So in Python 3.4, they added the asyncio and the concept of event loops.

00:37:51.560 --> 00:37:53.700
And I also have not used that a lot.

00:37:53.700 --> 00:37:57.600
But my understanding is that's a little like the Node.js style of programming.

00:37:57.600 --> 00:38:02.140
I don't know because I avoid JavaScript as much as I can.

00:38:02.700 --> 00:38:09.520
But basically waiting on the IO bits and releasing it to process other bits of code, other methods

00:38:09.520 --> 00:38:11.560
while you're waiting on IO, right?

00:38:11.560 --> 00:38:15.480
So it's going to let you know rather than you having to poll or be blocked.

00:38:15.480 --> 00:38:15.840
Right.

00:38:15.840 --> 00:38:16.720
Very callback driven.

00:38:16.720 --> 00:38:17.180
Yeah.

00:38:17.420 --> 00:38:17.640
Yeah.

00:38:17.640 --> 00:38:19.660
Which is a perfectly good approach.

00:38:19.660 --> 00:38:20.140
Yeah.

00:38:20.140 --> 00:38:24.460
And then Python 3.5 added the async and await keywords.

00:38:24.460 --> 00:38:25.280
Yeah.

00:38:25.280 --> 00:38:26.380
Which I haven't used.

00:38:26.380 --> 00:38:28.420
I'm still using 3.4.

00:38:28.420 --> 00:38:33.660
Partly because I had some compatibility issues with CX Freeze at the time.

00:38:34.000 --> 00:38:36.260
And partly because of the installer.

00:38:36.260 --> 00:38:43.600
For my commercial software, I released both 32 and 64 bit versions on Windows.

00:38:43.600 --> 00:38:52.060
And up to 3.4, it's really easy to install both of those side by side.

00:38:52.060 --> 00:38:53.080
There's no problem.

00:38:53.080 --> 00:38:59.240
But with a 3.5 installer, what I found was that some third party libraries couldn't find one

00:38:59.240 --> 00:38:59.800
or the other.

00:38:59.800 --> 00:39:03.700
So I'm a bit stuck with 3.5 on Windows at the moment.

00:39:03.700 --> 00:39:11.060
Well, and the installer for Python 3.5 got a major reworking by Steve Dower, who was actually

00:39:11.060 --> 00:39:12.000
just on the show.

00:39:12.000 --> 00:39:13.040
What number was that?

00:39:13.040 --> 00:39:15.500
That was 53.

00:39:15.500 --> 00:39:17.200
So just, you know, a few weeks ago.

00:39:17.200 --> 00:39:20.500
And the installer is much nicer than the old one.

00:39:20.500 --> 00:39:23.500
It is, but it doesn't do what I want.

00:39:23.500 --> 00:39:24.680
But it has, you know, right?

00:39:24.680 --> 00:39:28.420
If you need this other thing it's not doing, then obviously you can't use it, right?

00:39:28.420 --> 00:39:28.800
Yeah.

00:39:28.800 --> 00:39:33.560
I need to be able to install 32 and 64 bit Python side by side.

00:39:34.020 --> 00:39:35.840
And I can do that up to 3.4.

00:39:35.840 --> 00:39:37.920
I'm not saying it's not possible.

00:39:37.920 --> 00:39:40.560
I mean, I have done it with 3.5.

00:39:40.560 --> 00:39:43.760
But what I haven't managed to do is get my third party stuff.

00:39:43.760 --> 00:39:48.340
PyWin32 and APSW, which I'll mention at the end.

00:39:48.340 --> 00:39:51.900
I couldn't get them working properly with it when I had both.

00:39:51.900 --> 00:39:55.420
They work fine when I've just got one Python, but not when I had both.

00:39:55.420 --> 00:39:57.280
But hopefully that problem will go away.

00:39:57.280 --> 00:40:01.780
Because, I mean, sometime I'm going to, like, stop doing 32-bit versions of my apps.

00:40:01.780 --> 00:40:06.540
I really want to look into the async and await stuff more.

00:40:06.540 --> 00:40:10.140
Because that programming model is so beautiful.

00:40:10.140 --> 00:40:16.040
It's just I haven't been writing any code that requires that type of work.

00:40:16.520 --> 00:40:19.860
I like that model because it's very similar to the GUI event loop model.

00:40:19.860 --> 00:40:24.440
I mean, the GUI event loop basically sits there and says, I'll let you know if something happens.

00:40:24.440 --> 00:40:28.260
And you say, okay, well, if this thing happens, call this.

00:40:28.260 --> 00:40:28.660
Yeah.

00:40:28.660 --> 00:40:31.560
GUIs are inherently event-driven, right?

00:40:31.560 --> 00:40:32.360
Absolutely.

00:40:32.360 --> 00:40:34.860
They've got their message pump and everything.

00:40:34.860 --> 00:40:43.220
So you actually, one of the, maybe the last section in the concurrency bit of your book, you talk about special considerations for GUIs.

00:40:43.440 --> 00:40:43.720
Yeah.

00:40:43.720 --> 00:40:47.080
I mean, I did this using TKinter simply because that's in the box.

00:40:47.080 --> 00:40:49.020
You know, comes with Python out of the box.

00:40:49.020 --> 00:40:51.920
Although, personally, I use PySide and Q.

00:40:51.920 --> 00:40:53.280
But it would work.

00:40:53.280 --> 00:40:55.280
The method works with both.

00:40:55.280 --> 00:41:00.920
And I'm sure it would work with WX or with PyG objects, any GUI system.

00:41:00.920 --> 00:41:08.040
And what I discovered was, how do you make, well, the question that arose for me was, okay, I've got a GUI application.

00:41:08.040 --> 00:41:11.220
And it's got to do some CPU intensive work.

00:41:11.220 --> 00:41:13.180
But I don't want to freeze the CPU.

00:41:13.180 --> 00:41:15.980
Because what if the user wants to cancel the operation?

00:41:15.980 --> 00:41:17.620
Or what if they just want to quit the application?

00:41:17.620 --> 00:41:21.500
I don't want it frozen for like, you know, minutes on end when they can't do anything.

00:41:21.580 --> 00:41:27.920
One of the quickest ways you can make a user believe that your application is crappy is to have it just lock up.

00:41:27.920 --> 00:41:32.940
And on Windows, like, get that sort of white, opaque overlay saying not responding.

00:41:32.940 --> 00:41:35.580
Or on OS X, it says force quit.

00:41:35.580 --> 00:41:37.600
You're like, hmm, I'm a little suspicious now.

00:41:38.260 --> 00:41:41.980
Yeah, and sometimes those messages come too early because sometimes, you know.

00:41:41.980 --> 00:41:44.740
But anyway, so that was the problem that I had to address.

00:41:44.740 --> 00:41:51.560
And what I found was, if I use threading, I do have a work thread and a GUI thread, the GUI still freezes.

00:41:51.560 --> 00:41:56.440
So what I needed was some way of not having the GUI freeze.

00:41:56.440 --> 00:42:04.500
And the model that I came up with was I have two threads, one for the GUI, and what I call, rather sarcastically, the manager thread.

00:42:05.100 --> 00:42:08.600
And the thing about the manager thread is the GUI gives it work.

00:42:08.600 --> 00:42:11.920
Whenever there's work to be done that's like CPU intensive, it gives it to the manager.

00:42:11.920 --> 00:42:14.860
But like a good manager, the manager never does any work.

00:42:14.860 --> 00:42:21.260
And that means that the GUI thread always gets all the CPU of its core, so it's never blocked.

00:42:21.260 --> 00:42:24.060
And the manager's given all the work and never does any work.

00:42:24.060 --> 00:42:30.220
And that solved the problem because what the manager does, it uses multiprocessing to hand it off to other processes.

00:42:30.220 --> 00:42:33.620
And if you've got multiple cores, that's no problem.

00:42:34.340 --> 00:42:37.160
I did try it on single core machines, and it was still no problem.

00:42:37.160 --> 00:42:42.640
Right, because you still have the preemptive multithreading that gives you enough time slice that your user feels like your app's working.

00:42:42.640 --> 00:42:45.380
So basically, you've got two threads.

00:42:45.380 --> 00:42:48.880
The GUI thread gets all the CPU for its core.

00:42:48.880 --> 00:42:55.380
And whenever it has work, it gives it to the manager who immediately hands it on to a process in a process pool.

00:42:56.340 --> 00:43:00.120
And that process is separate and goes off and does it and lets you know.

00:43:00.120 --> 00:43:03.320
And, of course, is cancelable if the user wants that.

00:43:03.320 --> 00:43:04.980
Right. Okay. Very, very nice.

00:43:04.980 --> 00:43:07.660
Because it basically, it shares one int.

00:43:07.660 --> 00:43:12.820
And the int is either going to say, like, you're good to go or, like, they don't want you anymore.

00:43:12.820 --> 00:43:14.060
Stop work.

00:43:14.220 --> 00:43:16.300
That model I cover in the book.

00:43:16.300 --> 00:43:21.300
And as I say, it'll work with any, although I show it with TK Enter, it's not TK Enter specific.

00:43:21.300 --> 00:43:22.300
Yeah, absolutely.

00:43:22.300 --> 00:43:25.960
I think that's a pretty good summation of the concurrency story.

00:43:26.200 --> 00:43:37.100
The other part of performance that you talked about that I actually don't know very much about and I haven't talked a lot about on my show is using Cython to speed up your code.

00:43:37.100 --> 00:43:40.920
Can you tell everyone what Cython is and give a quick summary there?

00:43:40.920 --> 00:43:41.580
Okay.

00:43:41.580 --> 00:43:46.420
Cython is basically, it's a kind of compiler.

00:43:47.180 --> 00:43:58.660
So, if you have an application written in pure, ordinary Python and you run it through Cython, it will create a C version of your code.

00:43:58.660 --> 00:44:08.780
And my experience is that will basically run twice as fast, just without touching it, without doing anything, just because it's now C.

00:44:08.780 --> 00:44:10.860
But you can then go further.

00:44:10.860 --> 00:44:15.980
You can actually give it hints and say, well, you can give it type hints.

00:44:15.980 --> 00:44:18.900
So, basically, you can say, well, this is an int or this is a string.

00:44:18.900 --> 00:44:22.200
And if you give it hints, it can optimize it better.

00:44:22.200 --> 00:44:25.940
It's also got optimizations for NumPy.

00:44:25.940 --> 00:44:28.560
So, for people who are interested in that kind of processing.

00:44:28.560 --> 00:44:31.780
So, it can produce very fast code for that.

00:44:31.780 --> 00:44:32.760
Yeah, that's interesting.

00:44:32.760 --> 00:44:40.220
And it's, I think, worth pointing out that it's not the same concept as type hints in Python 3.5, which is just more of an IDE tool, right?

00:44:40.220 --> 00:44:41.120
That's right.

00:44:41.420 --> 00:44:49.900
Typing module 4.3.5 is, in a sense, it has no functionality at runtime.

00:44:49.900 --> 00:44:56.820
It is purely used for static code analysis to say, you know, whether you're being consistent with your types.

00:44:56.920 --> 00:45:07.440
In other words, you're saying, I'm claiming that this is a list of strings and it will statically analyze your code and say, well, okay, you've only used it as if it were a list of strings.

00:45:07.440 --> 00:45:08.140
So, that's good.

00:45:09.300 --> 00:45:16.160
But, of course, a compiler could use that type hinting information to produce more optimized code.

00:45:16.160 --> 00:45:18.100
And I expect that's where things will go.

00:45:18.100 --> 00:45:18.960
Yeah, absolutely.

00:45:19.160 --> 00:45:23.080
I'm hoping that Cython will actually adopt that syntax.

00:45:23.080 --> 00:45:27.940
And that, I mean, there are other compilers like Nuitka and so on may adopt that.

00:45:27.940 --> 00:45:34.380
Now that typing is a standard module, one hopes that the third-party compilers will adopt it.

00:45:34.380 --> 00:45:35.820
Yeah, at least there's an option, right?

00:45:35.820 --> 00:45:36.120
Yeah.

00:45:36.120 --> 00:45:37.760
And it would mean consistent code then.

00:45:37.760 --> 00:45:45.500
It would mean you could write your code using typing and know that whichever one of the compilers you chose would give you some kind of speed up.

00:45:45.500 --> 00:45:46.120
Yeah, yeah.

00:45:46.120 --> 00:45:46.460
Beautiful.

00:45:46.900 --> 00:45:53.220
All right, so we don't have a lot of time left in the show, but I wanted to give you a chance to just talk about some of the projects on your website.

00:45:53.220 --> 00:45:56.380
One of the ones that you were working on is something called Diff PDF.

00:45:56.380 --> 00:45:58.740
Another one was the Gravitate Game.

00:45:58.740 --> 00:45:59.920
I thought those were kind of interesting.

00:45:59.920 --> 00:46:00.940
Oh, yeah.

00:46:00.940 --> 00:46:02.640
The game was just a bit of fun.

00:46:02.640 --> 00:46:04.140
I did it in one of my books.

00:46:04.140 --> 00:46:07.720
I think it's actually in Python practice because I'd never put a game in a book.

00:46:07.720 --> 00:46:08.960
And I thought, well, why not?

00:46:08.960 --> 00:46:12.340
I wrote it in TK Inter, but I have got cute versions.

00:46:12.600 --> 00:46:19.220
And on the website, I've got a JavaScript version that I did using the canvas, you know, the HTML5 canvas.

00:46:19.660 --> 00:46:26.840
It's basically the same game or tile fall, but with the things gravitating to the middle rather than falling to the bottom and left.

00:46:26.840 --> 00:46:27.500
That's it.

00:46:27.500 --> 00:46:30.960
And yeah, you can do fun games with Python, no problem.

00:46:30.960 --> 00:46:35.280
And of course, there is a Pi game library as well for people who are more sort of heavily into games.

00:46:35.280 --> 00:46:35.760
Yeah.

00:46:35.760 --> 00:46:39.020
Diff PDF is paying my salary.

00:46:39.420 --> 00:46:44.660
Basically, it compares PDFs and you might think, well, that's easy.

00:46:44.660 --> 00:46:47.200
You just compare the pixels and it can do that.

00:46:47.200 --> 00:46:48.820
And lots of other tools can do that.

00:46:48.820 --> 00:46:58.400
But what turns out to be quite tricky is comparing the text as text because PDFs are really a graphical file format.

00:46:58.400 --> 00:47:01.940
So a PDF file doesn't actually know what a sentence is or even a word.

00:47:02.360 --> 00:47:05.200
So it can break up text in quite weird ways.

00:47:05.200 --> 00:47:08.360
And this PDF gives you a sort of rational comparison.

00:47:08.360 --> 00:47:09.160
Yeah, cool.

00:47:09.160 --> 00:47:15.080
So you can ask questions like, is the essential content changed, not just like something bolded or whatever, right?

00:47:15.080 --> 00:47:15.600
Yeah.

00:47:16.080 --> 00:47:18.080
And I thought it'd be used by publishers.

00:47:18.080 --> 00:47:20.820
I wanted to use it originally to compare.

00:47:20.820 --> 00:47:30.740
If you do a second printing of a book, not a second edition, a second printing, the publisher will let you make minor corrections as long as it doesn't change the pagination.

00:47:30.740 --> 00:47:40.820
And having to like check that I hadn't messed up by looking at 300 or 400 or 500 pages was pretty tiring.

00:47:40.820 --> 00:47:43.840
So that was an incentive for creating this tool.

00:47:44.080 --> 00:47:47.700
You're like, within the time it would take me to do this, I could write an app and solve this.

00:47:47.700 --> 00:47:47.980
Exactly.

00:47:47.980 --> 00:47:53.340
But it turns out it's used by finance companies, insurance companies and banks.

00:47:53.340 --> 00:47:54.860
Ah, the lawyer types, yeah.

00:47:54.860 --> 00:47:56.700
Well, yeah, but why?

00:47:56.700 --> 00:47:58.520
And I don't know because they won't tell me.

00:47:58.520 --> 00:47:59.460
But they use it.

00:47:59.460 --> 00:48:03.300
And as long as they buy, I don't care.

00:48:03.300 --> 00:48:04.140
I mean, that's great.

00:48:04.140 --> 00:48:04.580
That's cool.

00:48:04.580 --> 00:48:05.520
And is that written in Python?

00:48:05.520 --> 00:48:06.940
It is.

00:48:06.940 --> 00:48:10.020
It was originally written in C++, but now it's written in Python.

00:48:10.020 --> 00:48:12.800
It uses concurrency, the model of concurrency I described.

00:48:12.800 --> 00:48:16.480
And it's a Windows-specific product.

00:48:16.480 --> 00:48:23.180
It uses a third-party PDF library that I bought, a royalty-free library.

00:48:23.180 --> 00:48:26.480
And yeah, that's been paying the way.

00:48:26.800 --> 00:48:32.760
But I've come up with another program, one that I originally wanted to write more than 20 years ago.

00:48:32.760 --> 00:48:37.260
But I didn't have the skill then, and the tools weren't available anyway.

00:48:37.260 --> 00:48:38.460
And that's X-Index.

00:48:38.460 --> 00:48:39.620
And it's for book indexes.

00:48:39.620 --> 00:48:45.660
And there are a couple of – well, there are some existing products out there for book indexes.

00:48:46.200 --> 00:48:51.920
But this one uses Python, and it uses SQLite, which I adore as a database.

00:48:51.920 --> 00:48:53.460
I really like it.

00:48:53.460 --> 00:48:54.460
Yeah, I'm a fan of it too.

00:48:54.460 --> 00:48:54.740
Yeah.

00:48:54.740 --> 00:48:58.300
So I get all the reliability and also the convenience.

00:48:58.300 --> 00:49:03.660
I mean, SQLite has full-text search, which I think people from Google added to it.

00:49:03.660 --> 00:49:05.480
And that's just absolutely superb.

00:49:05.480 --> 00:49:05.980
Oh, yeah.

00:49:05.980 --> 00:49:06.540
That's excellent.

00:49:06.540 --> 00:49:11.060
So that application has only gone on sale the end of last month.

00:49:11.060 --> 00:49:11.560
Wow.

00:49:11.560 --> 00:49:12.640
Congratulations on that.

00:49:12.640 --> 00:49:13.080
That's excellent.

00:49:13.080 --> 00:49:13.400
Thank you.

00:49:13.400 --> 00:49:15.000
So I'm really pleased about that.

00:49:15.140 --> 00:49:19.700
And I'm waiting to see – like people can use it for 40 days free trial now.

00:49:19.700 --> 00:49:23.340
So I'll see in a couple of months if people actually buy it.

00:49:23.340 --> 00:49:25.260
Well, good luck on that.

00:49:25.260 --> 00:49:25.760
That's great.

00:49:25.760 --> 00:49:26.240
Yeah.

00:49:26.240 --> 00:49:30.900
So before we end the show, let me ask you a couple of questions that I always ask everyone.

00:49:30.900 --> 00:49:31.440
Yeah.

00:49:31.440 --> 00:49:35.940
So there's close to 80,000 PyPI packages out there.

00:49:35.940 --> 00:49:42.360
And everybody uses some that are super interesting that maybe don't get the rounds that everyone knows about.

00:49:42.360 --> 00:49:44.280
So what are ones that you really like?

00:49:44.580 --> 00:49:46.140
Well, obviously, I use the PySite.

00:49:46.140 --> 00:49:51.380
I use the Roman one as well because I use Roman numerals like in the indexing thing.

00:49:51.380 --> 00:49:51.980
Nice.

00:49:51.980 --> 00:49:52.940
Yeah, I know.

00:49:52.940 --> 00:49:57.100
But the – and obviously, I use CX3s.

00:49:57.100 --> 00:50:01.100
And I use PyWin32, which is very useful for Windows.

00:50:01.100 --> 00:50:06.040
And I use – and another Windows one I use is WMI, which is Windows –

00:50:06.040 --> 00:50:07.860
Windows management infrastructure, I think.

00:50:07.860 --> 00:50:09.700
Thank you very much because I'd forgotten.

00:50:09.700 --> 00:50:17.120
But the one that I want to sort of boost, if you like, is APSW, another Python SQLite wrapper.

00:50:18.000 --> 00:50:23.200
And as you know, the Python standard library has a SQLite 3 module, which is perfectly good.

00:50:23.200 --> 00:50:24.140
Nothing wrong with that.

00:50:24.140 --> 00:50:28.200
But APSW is absolutely excellent.

00:50:28.880 --> 00:50:39.120
It provides you as a Python programmer with all the access to SQLite that you would get if you were a C programmer, but with all the pleasure of programming in Python.

00:50:39.120 --> 00:50:39.600
Oh, wonderful.

00:50:40.060 --> 00:50:45.720
So you can create your own custom functions in Python that you can feed into it.

00:50:45.720 --> 00:50:47.320
So you can create your own collations.

00:50:47.320 --> 00:50:50.520
You can even create your own virtual tables.

00:50:50.520 --> 00:50:53.020
Everything that you can do in C, you can do in Python.

00:50:53.020 --> 00:50:56.940
And it's just a fantastic library.

00:50:56.940 --> 00:51:03.260
It doesn't follow precisely the DBAPI 2 standard.

00:51:03.260 --> 00:51:09.660
It does where it can, but it favors – if there's a choice and like SQLite offers more, it offers you the more.

00:51:09.660 --> 00:51:13.780
Because it's designed to give you everything that SQLite has to offer.

00:51:13.780 --> 00:51:21.360
I mean, if you're wanting to prototype on SQLite for transferring to another database, then use the built-in SQLite 3.

00:51:21.360 --> 00:51:33.760
But if you want to use SQLite in its own right, for example, as a file format or for some other purpose where you're only going to be using SQLite, then APSW is the best module I've ever seen for doing that.

00:51:33.760 --> 00:51:34.700
Yeah, that's wonderful.

00:51:35.220 --> 00:51:45.400
While we're on database packages, I'll also throw out records by Kenneth Wright, which he called SQL for humans, which is like the simplest possible sort of alternative to DBAPI that I can find.

00:51:45.400 --> 00:51:47.500
It's like the opposite end.

00:51:47.500 --> 00:51:51.100
It's like super simple, not like, you know, we're going to give you access to everything.

00:51:51.100 --> 00:51:51.720
Okay.

00:51:51.720 --> 00:51:55.980
I mean, I like writing raw SQL, so APSW suits me for that as well.

00:51:56.380 --> 00:52:01.760
I mean, I know there are things like SQLAlchemy and, you know, that give you a higher level, but I love APSW.

00:52:01.760 --> 00:52:02.600
Yeah, wonderful.

00:52:02.600 --> 00:52:03.040
Wonderful.

00:52:03.040 --> 00:52:03.840
Okay.

00:52:03.840 --> 00:52:04.420
Last question.

00:52:04.420 --> 00:52:07.180
When you write some Python code, what editor do you use?

00:52:07.180 --> 00:52:08.320
I use gvim.

00:52:08.320 --> 00:52:08.880
gvim.

00:52:08.880 --> 00:52:09.160
Okay.

00:52:09.160 --> 00:52:09.780
Okay.

00:52:09.780 --> 00:52:10.840
So graphical vim.

00:52:10.840 --> 00:52:12.840
Now, I'm not saying that I'd recommend that.

00:52:12.840 --> 00:52:21.520
I think that it could drive, you know, someone insane trying to learn it because it is a strange editor, but I've been using it for more than 20 years now.

00:52:21.520 --> 00:52:25.060
I just, I would find it hard to use another one for daily work.

00:52:25.060 --> 00:52:25.660
All right.

00:52:25.660 --> 00:52:25.960
Excellent.

00:52:25.960 --> 00:52:26.940
Well, thanks for the recommendation.

00:52:26.940 --> 00:52:27.540
Okay.

00:52:27.540 --> 00:52:31.620
And thank you very much for having me on your show.

00:52:31.620 --> 00:52:32.200
Yeah, Mark.

00:52:32.200 --> 00:52:33.520
It's been a great conversation.

00:52:33.840 --> 00:52:42.540
And I'm really happy to shine a light on this whole concurrency story and the GUI story as well in Python because they don't get as much coverage as I think they should.

00:52:42.540 --> 00:52:42.780
No.

00:52:42.780 --> 00:52:45.800
And Python is good at both of those things.

00:52:45.800 --> 00:52:48.600
You know, I don't believe people who say otherwise.

00:52:48.600 --> 00:52:54.180
Python needs, gives you a lot of concurrency options, so you've got more choice and you need to choose with more care.

00:52:54.180 --> 00:52:57.400
But if you choose well, then it will give you great performance.

00:52:57.400 --> 00:52:58.440
Yeah, absolutely.

00:52:58.440 --> 00:52:59.060
All right.

00:52:59.060 --> 00:52:59.880
Well, thanks for being on the show.

00:52:59.880 --> 00:53:00.600
It was great to talk to you.

00:53:00.600 --> 00:53:01.800
Thank you very much.

00:53:03.140 --> 00:53:06.060
This has been another episode of Talk Python To Me.

00:53:06.060 --> 00:53:11.060
Today's guest was Mark Summerfield, and this episode has been sponsored by Hired and SnapCI.

00:53:11.060 --> 00:53:13.000
Thank you both for supporting the show.

00:53:13.000 --> 00:53:15.760
Hired wants to help you find your next big thing.

00:53:15.760 --> 00:53:24.240
Visit Hired.com slash Talk Python To Me to get five or more offers with salary and equity presented right up front and a special listener signing bonus of $2,000.

00:53:24.240 --> 00:53:28.320
SnapCI is modern, continuous integration and delivery.

00:53:28.320 --> 00:53:32.580
Build, test, and deploy your code directly from GitHub, all in your browser with debugging,

00:53:32.840 --> 00:53:34.220
Docker and parallels included.

00:53:34.220 --> 00:53:37.260
Try them for free at snap.ci slash Talk Python.

00:53:37.920 --> 00:53:40.080
Are you or a colleague trying to learn Python?

00:53:40.080 --> 00:53:44.820
Have you tried books and videos that left you bored by just covering topics point by point?

00:53:44.820 --> 00:53:52.820
Well, check out my online course, Python Jumpstart by Building 10 Apps at talkpython.fm/course to experience a more engaging way to learn Python.

00:53:53.480 --> 00:53:59.240
You can find the links from today's show at talkpython.fm/episodes slash show slash 58.

00:53:59.240 --> 00:54:01.240
Be sure to subscribe to the show.

00:54:01.240 --> 00:54:03.440
Open your favorite podcatcher and search for Python.

00:54:03.440 --> 00:54:04.680
We should be right at the top.

00:54:04.680 --> 00:54:13.960
You can also find the iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss on talkpython.fm.

00:54:14.100 --> 00:54:18.680
Our theme music is Developers, Developers, Developers by Corey Smith, who goes by Smix.

00:54:18.680 --> 00:54:22.720
You can hear the entire song at talkpython.fm/music.

00:54:22.720 --> 00:54:24.480
This is your host, Michael Kennedy.

00:54:24.480 --> 00:54:25.760
Thanks so much for listening.

00:54:25.760 --> 00:54:26.940
I really appreciate it.

00:54:26.940 --> 00:54:29.100
Smix, let's get out of here.

00:54:29.860 --> 00:54:31.260
Staying with my voice.

00:54:31.260 --> 00:54:33.060
There's no norm that I can feel within.

00:54:33.060 --> 00:54:34.260
Haven't been sleeping.

00:54:34.260 --> 00:54:35.880
I've been using lots of rest.

00:54:35.880 --> 00:54:38.740
I'll pass the mic back to who rocked it best.

00:54:38.740 --> 00:54:51.060
I'll pass the mic back to who rocked it best.

