#211: Classic CS problems in Python Transcript
00:00 Many of you studied computer science at a university to get into programming and your careers.
00:04 I bet most of you came through some self-study or some sort of backdoor into the industry.
00:09 I count myself among that crowd.
00:11 This is one of the true bright spots of our industry that we can earn our way in
00:16 without necessarily going and getting formal college degrees.
00:19 But sometimes that academic formalism would come in handy.
00:23 That's where David Kopech's book is a great resource.
00:25 It's an approachable and quick introduction to computer science, and that's our topic on this episode.
00:30 This is Talk Python to Me, episode 211, recorded March 3rd, 2019.
00:36 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.
00:55 This is your host, Michael Kennedy.
00:58 Follow me on Twitter where I'm @mkennedy.
01:00 Keep up with the show and listen to past episodes at talkpython.fm.
01:03 And follow the show on Twitter via at Talk Python.
01:06 This episode is brought to you by Microsoft.
01:08 Be sure to check out what they're offering during their segments.
01:11 It really helps support the show.
01:14 Hey, everyone.
01:15 Before we get to our interview with David, I want to share some super exciting news.
01:18 You've probably heard about our 100 Days of Code with Python course.
01:22 Well, Bob, Julian, and I are back with another 100 Days of Code course.
01:27 This time, it's all about the web.
01:29 We've created a course called 100 Days of Web in Python.
01:32 And it's a similar style course, but it covers so many of the web technologies that you might be interested in.
01:39 Want to learn about async?
01:40 That's in there.
01:41 Want to learn about Django?
01:43 It's in there.
01:44 How about API Star, Flask, Pyramid, SQLAlchemy, database migrations, on and on and on.
01:51 Service functions, all that stuff, it's in there.
01:53 So if you're interested in the web and you want to really get started, you owe it to yourself to check out the 100 Days of Web in Python.
01:59 Just visit training.talkpython.fm and you'll find it right in the course section.
02:04 Now, let's talk to David.
02:05 Dave, welcome to Talk Python.
02:07 Hi, Michael.
02:08 It's a pleasure to be here.
02:09 It's great to have you here.
02:10 You've written a really cool book and you're talking about some great problems in computer science that I think will be beneficial to almost everyone listening, regardless of their experience level.
02:20 So I'm super excited to talk to you about that.
02:23 Of course, before we do, though, let's start with your story.
02:25 How did you get into programming in Python?
02:26 Well, my dad was a computer science professor.
02:29 So I started programming when I was a little kid.
02:32 He taught me basic.
02:33 I first got into Python when I was in graduate school.
02:36 So I was doing my master's degree at Dartmouth and all the classes were in Python.
02:41 So I thought, wow, I better become an expert in Python.
02:43 And I quickly got up to speed and really fell in love with the language.
02:47 I loved it for its succinctness more than anything, how you could write a program in Python and it would really look similar to the pseudocode that you maybe originally saw in an algorithm.
02:57 Yeah, I do think that's a really great aspect of it.
02:59 What languages did you learn before?
03:01 You said basic in the early ages, but you probably learned something in between.
03:04 Basic in the early ages, then I was really into Java when I was a teenager.
03:08 And then I got into Objective-C when I got interested in developing for the Mac.
03:12 I'd been all around the block, if you will, and then in graduate school, got into Python.
03:16 Yeah, interesting.
03:17 Objective-C is a different language as well.
03:21 I'm just thinking, you know, coming from Java to Python, right?
03:23 Like it's, there's a lot of symbol differences.
03:25 Did that strike you as strange or did you find it refreshing?
03:28 I found it refreshing.
03:30 I found how succinct it is, again, really refreshing.
03:32 I mean, Java has this notorious reputation for being verbose.
03:36 So does Objective-C.
03:37 A lot of people get scared off by Objective-C's very verbose syntax.
03:41 So Python was a breath of fresh air.
03:43 I feel the same way.
03:44 It's pretty awesome.
03:45 Okay.
03:45 So what do you do today?
03:47 You went to grad school and you started computer science.
03:49 You're passing the baton to the younger generation.
03:52 Well, yeah, I worked as a software developer for a few years and now, yeah, I'm an assistant
03:56 professor of computer science.
03:57 So kind of following in my dad's footsteps a little bit.
04:00 Yeah, that sounds like a really fun job.
04:02 You get all these new people who are excited about programming and work with them for a couple
04:06 of years and bring them along, right?
04:07 It's really exciting to see people having those aha moments when they first really grasp a
04:11 concept.
04:12 And so, yeah, it's a job that's really gratifying and a lot of fun.
04:15 Yeah, that's cool.
04:16 What's your favorite course to teach?
04:17 I like teaching iOS development.
04:19 Maybe your Python listeners won't like me saying that, but that's one of my favorite
04:22 classes to teach.
04:23 Yeah.
04:23 Well, it's just such an exciting platform, right?
04:26 There's just so many inputs, right?
04:29 Like there's GPS and altitude and even with regular programming, a lot of times you don't
04:35 get that device feel.
04:37 But with iOS and I guess Android as well, you do, right?
04:39 Yeah, absolutely.
04:40 Another fun class is emerging languages.
04:42 We kind of explore languages that are still on the periphery of industry and just becoming
04:46 more popular.
04:47 And that's really exciting because it keeps me fresh.
04:49 Yeah, absolutely.
04:50 What are some of the languages that are emerging?
04:52 So this semester we're doing Go, Swift and Clojure.
04:56 And so some people might be like, oh, those are those have already been around for a little
05:00 bit.
05:00 They're all around 10, five to 10 years old.
05:03 And, you know, if you think about a computer science curriculum, it moves a lot slower than
05:08 industry moves.
05:09 So these are languages that aren't usually taught in college or university level.
05:13 Yeah.
05:13 It's pretty cool that you're covering those.
05:14 All right.
05:15 So let's talk about the main topic.
05:18 These classic computer science problems.
05:20 You wrote a couple of books around classic computer science problems.
05:24 We'll touch on the books and more on the main ideas from them.
05:27 But let's just start with, you know, what makes a computer science problem classic?
05:33 Well, for the purposes of the book, we thought about a classic problem as one that somebody
05:39 who has a bachelor's degree in computer science would probably be familiar with.
05:43 So what have been taught in the college classrooms for the past, you know, 30 or 40 years, as long
05:49 as computer science programs have pretty much been around.
05:52 So if you go to somebody who has a CS degree, they probably heard about one of these problems.
05:58 Okay.
05:58 So that's our basis.
05:59 But then we had to also expand that into a whole book.
06:02 So we also just went over problem solving techniques that we think anyone who works in
06:08 software development, whether they have a CS degree or not, should be familiar with.
06:12 And then we're using problems to illustrate those classic problem solving techniques.
06:16 Yeah, I think that's a really great approach.
06:18 We recently did a beginners and experts episode on the show.
06:23 Time shifting hasn't released yet.
06:25 So yeah, there's no way you could have heard it, but it's been recorded and will come out
06:28 before this one.
06:28 And we talked about, you know, what is the real, one of the core differences between somebody
06:35 who's been programming for a long time and somebody who's new, right?
06:38 Like it takes, you know, a couple of months maybe to jump in and learn a language or learn
06:43 some important package like Django or SQLAlchemy or something like that.
06:47 But it seemed like one of the really big challenges a lot of the beginners laid out is you see a
06:53 problem, you see a whole bunch of APIs and like, how do you even break it down into something
06:58 you can start to solve, right?
07:00 These like partitioning problems into something you can think about.
07:03 Do you think that's a big problem you see with your students?
07:06 Absolutely.
07:06 I mean, if you've, you're never even familiar with the problem solving techniques, how are you
07:10 going to know that you should be applying them?
07:12 And so that's kind of the purpose of the book.
07:13 It's very broad in its scope.
07:15 That's been one of the criticisms of it, frankly, is that it doesn't go deep enough into any of
07:19 the particular problem solving techniques, but it's not supposed to.
07:22 It's supposed to be a broad introduction, give you a little bit of a taste of each of them.
07:26 And then maybe you know enough about them to know when they should be applied, but you're
07:30 not necessarily an expert in any of them just by reading the book.
07:33 Yeah, that makes a lot of sense.
07:34 One thing to think about is it feels to me like these classic problems, regardless whether they're
07:40 in your book or I think back to when I saw some of these, I don't have a CS degree.
07:45 I have a math degree, but I have what's like roughly a minor in CS.
07:51 My university didn't have CS or minors at all, really, but same idea.
07:56 And so when I saw these, and it feels like the ones you covered as well a lot here, I think
08:02 it's pretty typical, is it's very much focused on algorithms, data structures, and so on.
08:07 It's not like, here's how we're going to render a web page or generate a data-backed rest and
08:14 point or things like that, right?
08:16 It's not about GUIs, and it's more focused on what is the core problem, what are the right
08:22 algorithms and data structures and ways of thinking about it to solve it, right?
08:26 That's absolutely true.
08:27 On the other hand, at the end of every chapter, we include a small section called real-world
08:32 applications.
08:33 And it tries to bring them back into the day-to-day software development world.
08:37 So how are people in apps that you're probably familiar with actually using these techniques
08:41 to solve problems?
08:42 I think a lot of us as software developers, we get into a mode of kind of API plumbing, where
08:48 we end up kind of just hooking up other people's libraries, hooking up APIs together, and we get
08:54 a little bit away from the problem-solving parts of software development.
08:57 And for people who felt like they've been in kind of that API plumbing mode for too long,
09:02 maybe they'll find this book a little refreshing.
09:04 Yeah, I think they will as well.
09:06 So who do you think should study this kind of stuff, right?
09:09 I mean, you talked about the API plumbing, and sometimes people will make the distinction
09:14 between coders and software developers.
09:16 I'm not really sure I like that.
09:18 But should everybody really understand these algorithms deeply?
09:22 Who should really pay attention to this, do you think?
09:24 I think anyone who's a software developer should be familiar with these problem-solving techniques.
09:28 This book is actually geared for people who don't have a CS degree.
09:32 It's to introduce those concepts that you missed out on.
09:35 By being a programmer and not having that CS education, giving you a basis to understand
09:40 what everyone else is talking about when they talk about something like a neural network or
09:43 genetic algorithm.
09:44 What is that stuff?
09:46 And when should you actually be using a technique like that?
09:48 So I really think it's a broadly applicable set of topics that everyone should at least be
09:53 familiar with.
09:54 Again, it doesn't mean you need to be an expert.
09:56 It doesn't mean you need to go do four years of study.
09:58 But when you run into a problem in your software development day job, or even if you're just
10:03 doing software development as a hobby, you should know what tools are available to you to solve
10:07 it.
10:07 Right.
10:07 Even simple stuff like, oh, that should be a dictionary or that should be a set versus a
10:11 list.
10:11 Yeah, absolutely.
10:12 Yeah, for sure.
10:14 Okay.
10:15 Well, one of the things I really liked about your presentation of all these classic computer
10:20 science problems was one, it was in Python, but it was in modern Python.
10:25 And I mean, really modern Python, not like, oh, it was three, five or something, right?
10:29 You, you really made proper use of a lot of the cool new features.
10:33 Yeah, that was a goal.
10:35 You know, we put together a team when we started the Python version of the book that and we said,
10:39 we need to have a team that's really familiar with the latest techniques, because we want
10:43 this book to hopefully be evergreen for a few years.
10:46 We want people to feel like if they buy it two years from now, it's using fairly modern
10:50 Python.
10:50 So we decided from the beginning, we're going to do Python 3.7.
10:54 And, you know, really cutting edge stuff.
10:55 And we're actually going to use features from Python 3.7.
10:58 Things like data classes, things like recent additions to type hints.
11:04 And I will say that especially the decision to use type hints was a very controversial one
11:10 amongst the reviewers of the book.
11:13 So Manning has a pretty extensive review process.
11:15 Every one of their books goes through three sessions of external expert reviewers looking
11:21 at the material.
11:22 And there were a couple of reviewers every time who said, you know what, I feel like the type
11:26 hints are not standard Python.
11:28 They're really a fad or they're really just something that takes away from the succinctness
11:33 of the language, which, like I mentioned before, is something I love about the language, how
11:36 succinct it is.
11:37 And I kind of agree.
11:38 You know, obviously your code does look a little more bulky with all the type hints inside of
11:42 it.
11:42 On the other hand, we wanted to do where we thought the puck might be going.
11:47 And type hints are becoming more popular.
11:50 And there definitely is some value that they add, and arguably in terms of readability, knowing
11:55 right away what a function is going to return is a nice thing.
11:58 without having to read through it.
11:59 And we took a bit of a gamble on it.
12:03 And I hope that for people who are interested in type hints, this book might be a gateway into
12:07 getting even more interested in them.
12:09 I think so.
12:09 You know, I'm personally a big fan of type hints in Python.
12:14 I think they can go overboard.
12:16 I think they can make the code look too bulky.
12:18 But there's something really powerful about just sprinkling a few of them here and there
12:25 and then having your editor, your IDE, whatever, just come to life knowing exactly what you're
12:30 working with.
12:31 And even catching errors and mistakes.
12:33 That's pretty powerful.
12:34 Yeah, absolutely.
12:35 I mean, the ability to use something like mypy and know ahead of time where some of your
12:39 errors lie before you get to runtime is really great.
12:43 When it comes to self-documenting your code, there's also a big advantage with type hints,
12:47 which is you right away know what these parameters are supposed to be.
12:52 You right away know what this function is supposed to return.
12:55 It can make your comments a little less bulky.
12:57 So you're kind of trading off a little bit of bulkiness in the code for maybe a little
13:01 bit of less need to be so explicit in your comments.
13:05 Right.
13:05 Yeah.
13:05 If you say something's a Boolean, you don't need to describe in the comments, hey, this
13:09 is a Boolean.
13:09 It's like, well, colon Boole, we're good.
13:11 Exactly.
13:12 Yeah.
13:12 Yeah.
13:12 It's pretty interesting.
13:13 So reading through your examples, though, you know, when I think of like, why do I care
13:19 about type hints?
13:19 I care about them mostly for the reason, like personally, I care about them for the reason
13:24 that I laid out, which is I love how it makes the editors like PyCharm and VS Code really
13:30 help you a lot.
13:32 Right.
13:33 It helps you deeply understand the types you've been passed around and give you autocomplete
13:37 and stuff like that.
13:38 Or error, you know, sort of PEP 8 type error checking.
13:40 It's like 8 type error checking.
13:42 There's mypy, which you mentioned, which was really instrumental in helping Dropbox convert
13:47 like a million lines of Python 2 to Python 3, which I thought is an interesting use.
13:51 But the third one that I realized looking through your stuff is, it's really nice when you have just static text, not in an editor, right?
14:00 It's just sitting there on the page.
14:01 I don't have a command click go to definition to see what's happening on this function.
14:06 But it, you know, if you put the type hints right there, you're like, oh, well, this is
14:10 returning a list of ints.
14:11 So I don't need to go and like try to find out where that was in the writing.
14:16 I can just see that's a list of ints and go with that.
14:18 That's pretty cool.
14:19 Yeah, absolutely.
14:20 I mean, that's supposed to be the goal is that it's more readable.
14:22 Now, when you first see them, and if you're not used to them, I think it is actually less
14:27 readable.
14:27 And it takes some time to get used to.
14:29 So I think it's a fair criticism of those reviewers that for some people, the type hints
14:35 do actually make the code at first glance less readable, especially if you're not familiar
14:39 with them.
14:39 We did include a short appendix, appendix C to the book that kind of gives you a crash
14:43 course in type hints.
14:44 But it's really just a matter of spending time with them to really start to gain the benefits
14:51 of the readability, I think.
14:52 Yeah, it does take a little bit of getting used to.
14:55 You know, two things I actually learned, even though I'm a big advocate and fan, I use the
14:59 type annotations a lot.
15:00 One of the things that I saw that you were using, a couple of things that were pretty cool.
15:06 One, you were using from future import annotations, Dunder future annotation.
15:11 That's kind of a new feature coming for type hints, right?
15:14 Want to tell people about that?
15:15 Yeah, absolutely.
15:16 So if you have, for lack of a better word, recursive type hints, so a type hint that refers to the
15:21 type inside of it, up to version 3.7 and this new annotations feature added to under Dunder
15:28 future, you would have to actually put it in as a string in quotes.
15:34 And if you now import annotations, you can actually just put the type as its actual type
15:40 name without putting it in quotes, which is how the language will work in the future.
15:43 But for now, we have to have that in future.
15:46 Yeah, there's always this limitation where like, it's super common to have a class that interacts
15:51 with itself, right?
15:53 Exactly.
15:54 You know, like I've got a class, maybe it has a static method, and it's going to take another
15:58 one.
15:58 Like just for comparison, like, is this instance of me less than another instance of me?
16:03 If you want to put that in type annotations, that was really tricky to understand what to
16:06 do without this new future version of stuff.
16:10 Because the class technically wasn't finished being defined yet.
16:12 So you couldn't say the name of it within like the functions or the properties or whatever.
16:16 Yeah, absolutely.
16:17 Yeah, interesting.
16:18 Another one that I thought was interesting is the ability to define a generic type.
16:23 I literally had never seen that or tried that.
16:26 But you have a lot of data structures that you're defining to be generic, like a generic
16:30 of T, like bracket, bracket.
16:32 Yeah, absolutely.
16:32 That's interesting.
16:33 Yeah, we're trying to make all the code in the book as generic as possible.
16:36 And let me use that term more broadly than just the type in specific version of it.
16:40 We want to write algorithms that can work for many different problems.
16:44 We don't want to code them so specifically that they're only solutions to the one problem at
16:49 hand.
16:49 We want them to be techniques that you could then just drop in the same method, the same
16:53 class, and use it in a totally different program.
16:55 So in most languages, we have something called generic programming.
17:00 So in Java, you have generics.
17:02 In C++, you have templates.
17:04 In type hints in Python, you have the generic type, which is similar to those other two methods
17:09 in those other languages.
17:10 It's a way of saying, you know, I don't know exactly what type I'm going to be using in this
17:15 problem.
17:16 So here's a type that can fit in for any type.
17:20 And that is what a generic type is.
17:23 It's a, I don't know yet what the type is, but at runtime, we're going to figure out what
17:27 the type is.
17:28 Yeah, it's interesting.
17:29 It's like the list of int or list of string type of feature that you can get from if you
17:36 import it from typing.
17:37 But you can more broadly, you can make it for your own types.
17:39 That's cool.
17:39 Yes, you said data classes, f strings, all the good stuff, right?
17:43 I just want to give a shout out to the cool sticker I saw at PyCascades recently.
17:47 They had a sticker for new versions of Python and said, F quote, yes, exclamation mark quote.
17:54 So F yes.
17:55 It was beautiful.
17:56 Yeah.
17:56 I mean, there's so many different ways to do string interpolation in Python and f-strings
18:01 are just so beautiful compared to the older ways.
18:03 They are for sure.
18:05 This portion of Talk Python is sponsored by Microsoft and Visual Studio Code.
18:10 Visual Studio Code is a free, open source, and lightweight code editor that runs on Mac,
18:15 Linux, and Windows with rich Python support.
18:17 Download Visual Studio Code and install the Python extension to get coding with support for
18:22 tools you love like Jupyter, Black Formatting, Pilot, pytest, and more.
18:25 And just announced this month, you can now work with remote Python code bases using the new Visual
18:31 Studio Code remote extensions.
18:33 Use the full power of Visual Studio Code when coding in containers, in Windows subsystem for
18:38 Linux, and over SSH connections.
18:40 Yep, that's right.
18:41 Auto completions, debugging, the terminal, source control, your favorite extensions.
18:45 Everything works just right in the remote environment.
18:48 Get started with Visual Studio Code now at talkpython.fm/Microsoft.
18:54 One thing I wanted to ask you about is you've done this book in Python, understanding these classic computer science programs, and you've done the same problems,
19:03 not the exact same book, but basically the same problems, right?
19:06 In Swift, because you had a classic science problems in Swift as well.
19:10 I'd love to get your thought on comparing those two experiences.
19:13 Absolutely.
19:13 So most people would not be surprised that the Python version was more succinct,
19:19 and a lot of people would not be surprised that the Python standard library
19:23 is a lot richer than the Swift standard library.
19:26 And so there were actually parts of the code.
19:29 First of all, when we were doing the Python book, we wanted the book to be, of course,
19:33 totally Pythonic.
19:34 So we put together a team of reviewers and also the technical editor that were really
19:39 experienced Python programmers who were making sure that we were not just porting over the
19:44 Swift book and it was still looking like Swift in Python.
19:46 We wrote most of the problems again from scratch.
19:49 But there were situations where there was whole methods from the Swift book that we could totally
19:56 eliminate in the Python book because those techniques already exist in the standard library.
20:01 And so while in the book, we only basically use the Python standard library, we don't use any
20:06 external libraries.
20:07 That still meant that we were able to eliminate whole sections of code that just don't exist in
20:14 the Swift standard library.
20:15 As far as the languages go, I mean, Swift has been inspired by Python in some of its syntax.
20:20 So there are actually parts of it that look surprisingly similar.
20:24 But as a whole, like I mentioned, the Python code is certainly more succinct.
20:28 I'd say the Python code is probably more readable as a whole.
20:32 And working with the two languages, Swift is a moving target, right?
20:35 Swift, every version of Swift, there's sometimes some breaking syntax changes and it can be kind
20:41 of annoying.
20:41 It's starting to stabilize the last couple of years.
20:43 Python, of course, is an extremely mature language with extremely mature tooling.
20:48 And so that was, of course, a pleasure as a book writer.
20:51 You don't want to be writing code that is going to be obsolete a year from now.
20:55 So that was, you know, much more fun on working, not having to think about that when writing
21:01 the Python book.
21:02 Yeah.
21:02 You know, I don't think it would be a major difference just remembering, reading through
21:06 all the solutions you have.
21:08 But one of the big differences with Swift is it's one of the most statically typed, safest
21:14 of all the languages that I've worked with, at least.
21:17 Because, you know, not only does it have a type system, it has things like these are read-only
21:23 reference variables and all sorts of stuff like that, right?
21:26 Like the constant or like nullable.
21:29 You have to explicitly say a type is nullable or it can never be the equivalent of none and
21:34 things like that.
21:34 Did that make a big difference or not so much?
21:37 I think it does for the reader.
21:39 I think because there's a lot of cognitive overhead that goes with all of those safety
21:43 features.
21:44 So having to always think about, should this be an optional?
21:46 Oh, if this isn't optional, do I have to unwrap it here?
21:50 That's the type of stuff you have to constantly think about in Swift.
21:52 And in Python, your mind is a little more free.
21:55 So it actually makes the code more readable and easier to think about when you don't have
22:00 to think about all those safety features.
22:02 I think Swift code takes longer to write than Python code.
22:05 It might be safer at runtime, but it also takes, I think, less time to read Python code than Swift
22:11 code as well, because you're not reading through all of those hoops you have to jump through
22:15 for safety.
22:15 Yeah, that's my sense from my limited experience as well.
22:18 All right.
22:18 Let's dig into some of these classic problems.
22:21 And I would say you start off all of your problems with this one as well.
22:25 But I would say one of the most mind-bending ideas of problem solving when you first come
22:32 into properly study programming rather than just like, oh, here's a loop and here's a variable.
22:38 I'll make it work is recursion.
22:40 Absolutely.
22:41 It's just like, wait, how does this work?
22:44 There's no loops, but we're going across a set or something crazy like that, right?
22:48 Yeah.
22:48 Recursion is something that does bend your mind the first time you see it.
22:52 And it requires a new way of thinking.
22:55 The crazy thing is there's whole programming languages that are based around recursion.
22:58 I mean, if you look at some of the functional programming languages, they don't even have
23:01 loops.
23:01 Anytime you want to repeat yourself, you have to do recursion.
23:05 Recursion is not a technique you want to apply every time you program.
23:08 Recursion is a technique you want to apply when it makes more sense to you.
23:12 Because any problem that you can solve recursively, you can also solve iteratively.
23:17 So you can also solve using loops.
23:19 So they really are two different ways of just repeating yourself.
23:23 But it's a question of which one makes more sense to you in your mind when you're abstracting away
23:28 the problem and thinking about how you want to solve it.
23:30 So there are situations where even though a loop might actually be more performant, which it often
23:36 is in a language like Python, the recursive way of approaching the problem is more similar to how you
23:44 might have written it down on paper, let's say.
23:45 Right.
23:46 Or maybe it perfectly matches the data structure you're trying to interact with, right?
23:50 If I have like a tree data structure or something hierarchical, that's often really easy to work
23:55 with in a recursion way.
23:56 Because, you know, the recursion step is take one of the sub elements and work on it as if it
24:01 was the whole thing until you're out of sub elements, right?
24:03 Absolutely.
24:04 And we sometimes call structures like that even recursive data structures.
24:07 Absolutely.
24:08 Yeah, for sure.
24:09 And this one, the example was a Fibonacci sequence, of which I'm also a fan for these little demos
24:15 that's complicated enough, but not super complicated.
24:17 It's nice to work with.
24:18 So you said, all right, well, let's try to solve the Fibonacci sequence problem with recursion
24:22 because, well, the first two numbers are either 0 and 1 or 1 and 1, depending on who you ask.
24:28 And then it's you add those two together.
24:30 So you have, you know, F of N is just F of N minus 1 plus F of N minus 2.
24:36 And it's super easy to do recursion.
24:38 But it turns out that recursion kind of goes a little bit crazy in its naive way.
24:43 For example, Fibonacci at 20 using recursion does 21,891 calls.
24:49 That's a lot.
24:50 So you said, well, the reason is we keep asking the same question again and again.
24:54 What is a Fibonacci at 10?
24:55 And if you want 11, you ask, you know, that the same subsequence a whole bunch of times, right?
25:00 Right.
25:00 So then you get into memoization, which is a funky word, but a very cool technique.
25:05 Yeah.
25:05 Very good for performance.
25:06 Absolutely.
25:07 I mean, yeah, the Fibonacci sequence is for listeners who aren't familiar with it is each of the numbers is the sum of the previous two numbers.
25:14 So it goes 0, 1, 1, then 1 plus 1 is 2, then 2 plus 1 is 3, then 3 plus 2 is 5, etc.
25:21 Right.
25:21 And so how can if you actually write that recursively, it looks incredibly similar to how we just defined it.
25:28 We just said it's the sum of the previous two numbers.
25:30 So recursively, you can actually write, take the Fibonacci number of the previous one, n minus 1, and add it to the Fibonacci number of two ago.
25:39 So Fibonacci of n minus 2.
25:41 It's actually like two lines of code, maybe three lines of code in Python.
25:45 That's, like you mentioned, not very performant because you do all of those recursive calls many, many, many times when you do a large number.
25:53 And we show in the book how you kind of go through a tree of them.
25:57 So we go through six different ways of solving the problem, each of them getting a little bit more performance or just doing it in a slightly different way.
26:05 One way to solve the problem is with memoization.
26:08 In memoization, you actually store results that you've already computed so that if you need to compute them again, you can just look them up.
26:16 That's basically the idea.
26:18 It's a fancy word for a really simple idea, which is basically just let's store results that we've already come up with.
26:24 And then we can just look them up instead of having to calculate them again.
26:27 So in Python, it's super easy to do that.
26:29 You can do that using a dictionary.
26:31 You can just store in a dictionary.
26:32 Here's a previous result for a certain set of parameters.
26:35 In the case of Fibonacci sequence, it's just a single parameter.
26:38 Which Fibonacci number are we trying to look up?
26:40 We can just store that in a dictionary.
26:42 There's also a built-in facility in Python called LRU cache.
26:47 And this is from the functools package.
26:49 And you can use that to automatically store the result of a function so that the next time the function is called, we just auto look in that cache for what the last result is.
27:00 So that's a little tip from the Python standard library that can just make a huge performance difference for all kinds of different mathematical calculations because you're not going to have to do them again.
27:11 They're just already going to be stored for you.
27:13 Or anything that can be cached, basically.
27:15 Right.
27:16 So you put the LRU cache decorator on your function.
27:19 And that function could be computing Fibonacci numbers.
27:22 Or it could be going to a database and making complicated queries.
27:26 But you're like, well, if this is however old or maybe this rarely, rarely changes between runs of the app, you only need to ask the database once.
27:34 And then you hold on to the answer, right?
27:36 Yeah, absolutely.
27:37 And that's one of the things I love is when we're able to teach something that's actually an important computer science topic, memoization.
27:43 But we're also teaching you something about the Python standard library that easily in your day-to-day software development can really save you a ton of performance cost.
27:51 Yeah.
27:51 I agree.
27:53 Super, super nice.
27:54 Final takeaway.
27:55 Well, I guess I talked about there being 21,000 function calls for Fibonacci at 20.
27:59 If you use memoization, it's 39.
28:01 Right.
28:01 Even so, directly implementing it with generators and just a straight loop is still much faster.
28:07 Right.
28:08 Then it's just 20.
28:08 So it's literally just every single one.
28:11 Let's say we're doing Fibonacci number of 20.
28:13 We just have to do basically going around the loop 20 times, one for each number in the sequence.
28:18 So that's a situation where, yeah, iterative, which is loops, is actually more performant than recursive.
28:24 And that you'll often find is the case.
28:27 Even though recursion is a really cool technique, usually it's actually faster to do things with loops in Python.
28:32 It's more about optimizing for understanding versus performance, which maybe understanding is better at first, unless it really matters.
28:40 Exactly.
28:40 Yeah.
28:41 So, you know, one thing I wanted to ask you about, like, when I learned about the yield concepts and these generator ideas and whatnot, like, that's pretty mind-blowing.
28:51 And would that almost be its own sort of classic CS problem is these lazy deferred execution types of things?
28:58 You know, maybe it should be.
29:00 I mean, we should have covered it as such in the book.
29:02 We don't go into it in a huge amount of detail.
29:04 We kind of assume the reader is familiar with generators and maybe we shouldn't have.
29:08 And, you know, that's something I should have prefaced our whole conversation with is we wrote the book for intermediate to advanced Python programmers.
29:15 So it's really not appropriate for people who are just learning Python for the first time.
29:20 We already assume that you've been doing Python development for probably a couple of years before you pick up this book.
29:26 So we don't go into explaining every detail of generators.
29:31 And, you know, now that I think about it, maybe we should have because even some intermediate and advanced programmers are not always familiar with them.
29:37 I think it's okay.
29:37 But I do think this idea of solving problems with generators, it's just so powerful and just unlocks a lot of potential for interesting pipelines and other types of execution with, you know, like, usually simpler code.
29:51 Yeah, absolutely.
29:52 It's a non-generated version.
29:53 So, yeah, pretty interesting.
29:54 So another area that you focused on was data representation.
29:58 And in a pretty simple form, not like, is this a pointer rich data structure or is sparse or things like that.
30:05 But, you know, you focused on trying to represent DNA, right?
30:09 I think is what it is.
30:11 You had codons and stuff like that.
30:12 Actually, it was just the amino acids at first.
30:15 So you're just like, well, how do we store this?
30:18 We could store it as a string.
30:20 And, you know, the AC, T, G, each one could be a string.
30:24 And that would be pretty large in terms of representing data.
30:27 You could store it as integers, but even integers are not very efficient in Python.
30:31 So then you started working with bits and bytes, right?
30:34 Yeah.
30:34 So, you know, oftentimes we don't even think about how some, what we think are really simple bits of data are going to be stored.
30:41 So your first inkling when you're thinking, how should I store the letters A, C, T, and G to represent DNA might be, why don't I just store them as strings, right?
30:51 Because those are letters.
30:52 So that makes sense.
30:53 Letters can be parts of strings.
30:56 But actually, you know, if you only have four different things that you want to store, you can do that with just two bits of information per thing.
31:04 Because two bits can represent up to four different numbers, right?
31:08 And so each of those four different numbers could represent one of those four letters.
31:13 And so if for a string, we're going to end up storing approximately eight bits for every character.
31:19 So eight bits for A, eight bits for T, et cetera.
31:23 And if we do it as a number and we can actually just store two bits for A, two bits for T, that's actually a 75% space savings.
31:31 Just from thinking about the fact that, oh, I actually don't need that many different possibilities that a string provides for this type of data.
31:40 So it sounds like common sense.
31:42 It is common sense.
31:43 But we don't even think about that sort of thing a lot of the time.
31:46 And so we're wasting a lot of data.
31:47 It's not obvious at all.
31:48 Yeah.
31:48 In something low level like C, you talk explicitly about the data types very, very carefully.
31:55 This is a four byte integer.
31:56 That's a two byte integer.
31:57 But even integers in Python are like 28 bytes each for like the number zero.
32:03 Well, yeah.
32:04 But at least the number zero to 256 are, you know, flywheel pre-allocated.
32:09 So they're shared.
32:10 But, you know, normal numbers take up a lot of space because they're really embedded within these PyObjects.
32:15 Right.
32:15 Things that are tracked by Python with reference counting and all that.
32:18 And we're not saying that you should always be doing this, right?
32:21 Maybe it actually is a lot more convenient a lot of the time to just use strings.
32:24 And you absolutely should do that.
32:26 But if you're not even aware that there was a better way or there was a more at least space efficient way,
32:32 maybe there's a situation where you're not storing kilobytes of data, but you're storing gigabytes or terabytes of data.
32:39 And you could have actually been saving 75% of the space.
32:41 Well, wow.
32:42 Maybe it's relevant then.
32:43 So, again, it's just something we want to make people aware of.
32:46 We're not saying you should always be finding the absolute most efficient way to store everything.
32:50 Right.
32:50 Or you have a RESTful API that's exchanging JSON.
32:54 And you're like, you know, if we wrote this in straight TCP packets and we just did this bitwise storage, we could drop, you know, the traffic by 95% or something.
33:06 Sure. I mean, but that sounds really painful.
33:08 And that doesn't sound like something I'd rather just exchange the JSON, I think, most of the time.
33:13 Yeah. I'm thinking of like places where, you know, PayPal has an API that's called like several billion times a day.
33:21 And it needs really, really low.
33:22 Like, it's got to be pretty far out there.
33:25 Yeah, absolutely.
33:26 It's got to be an extreme case before you want to start.
33:28 Yeah. Going down that road.
33:30 Yeah, absolutely.
33:30 All right.
33:31 So another one that I thought was an interesting one was the Towers of Hanoi, because it's super painful to solve by hand.
33:39 You know, you sit down and it looks real simple.
33:41 You've got these three towers that each have these little disks.
33:43 Disks are bigger as you go down.
33:46 And the job is to move them from one end to the other end.
33:50 But the rule is the big ones can't go on top of the little ones.
33:52 So how do you, you know, what series of operations?
33:55 It's kind of like the Rubik's Cube type of thing, right?
33:58 But the solution turns out to be beautifully solved with recursion, right?
34:03 Right. Yeah.
34:03 I mean, so it sounds like a really simple problem.
34:06 So we have three different, let's call them towers.
34:09 And we have three disks.
34:11 And we want to move all the disks from one tower to another tower.
34:14 How can we do that?
34:15 Well, if you actually try to work it out, it's several steps.
34:19 And they're not completely obvious when you just look at it.
34:22 Although it does help to be able to see a model of it.
34:24 So it's a little hard to describe on a podcast.
34:27 But what's interesting is that the solution actually just comes down to figuring out two things.
34:32 One is how do I move a single disk?
34:34 And the second one is how do I move all the rest of the disks?
34:38 And that actually lends itself to a really succinct recursive solution.
34:44 Just a few lines of code.
34:45 And then it can solve it actually for any number of disks.
34:48 Not just three, but let's say we had 10 disks.
34:50 Well, we can actually solve that by just knowing how to solve the first disk.
34:54 And then having some step that we do for moving all the rest of the disks.
34:59 So it's kind of mind-blowing when you actually see how succinct it is to solve it for any number of disks recursively.
35:07 And I don't think we can explain it wonderfully on the podcast.
35:12 But I guess people will just have to take our word for it.
35:15 It is really, really interesting.
35:17 Like the last move is obvious.
35:19 You just move the disk over.
35:21 It's kind of mind-blowing.
35:22 And I do think it's a perfect application of this idea of how recursion really nicely fits these things.
35:29 Because I think it would be really tough to implement that with loops and just, you know, mutable data structures and whatnot.
35:35 Yeah, absolutely.
35:36 So one that I thought was really nice tying together algorithms and data structures in a way that's kind of mind-blowing, to be honest, was maze traversal.
35:46 Yeah, you know, there's so many different ways to traverse a maze.
35:48 And this is actually a really good practice for people who are going into coding interviews.
35:54 Because there's a lot of these techniques for solving a maze that often come up in coding interviews.
35:59 Things like depth-first search, breadth-first search.
36:03 And then at the end of the chapter, we go into A star, which is actually the most efficient algorithm for finding a path through a maze.
36:09 But it's remarkable how all of these techniques, when you write them out in code, they're very, very similar.
36:16 And all you have to actually change between depth-first search and breadth-first search, for example, is the data structure.
36:21 So in both of those, you're keeping track of where am I going to look next?
36:26 We call that the frontier.
36:27 So where are the next places on the grid?
36:31 Let's say the maze is a grid.
36:33 Where are the next slots on the grid that I'm going to look at next?
36:37 In depth-first search, you use a stack.
36:41 So you just keep going through and you keep adding the next places you're going to look onto the stack.
36:46 In breadth-first search, you use a queue.
36:49 And so all the rest of the code stays exactly the same.
36:54 You don't change a single other thing in your code.
36:55 You just change, okay, for the frontier, the next places I'm going to look at, I'm going to use a queue instead of a stack.
37:01 And then the way that we traverse the maze totally changes.
37:04 In depth-first search, we kind of take a stab at it.
37:07 We go as far as we can through.
37:09 And if we hit a wall, we backtrack to the last location.
37:12 That's probably the way real people would, right?
37:14 Like if you were put in, as a human, put into a maze, you would just go until you hit a wall.
37:18 You would probably not apply some breadth-first algorithm, little notebook or whatever, right?
37:23 Right.
37:23 Yeah.
37:24 As human beings, that's how we usually think.
37:25 I think we kind of think in a depth-first way.
37:27 But a breadth-first search is super systematic.
37:30 It says, you know what?
37:31 Let's first look at the nodes that are one away from where we started.
37:34 Then let's look at the nodes that are two away from where we started.
37:37 Then let's look at the nodes that are three away from where we started.
37:39 And just by changing from a stack to a queue, and for those that don't know, a stack is a data structure where the last item we put in is the first item we take out.
37:48 And a queue is a data structure where the first item we put in is the first item we take out.
37:53 Just making that change of data structure, so literally changing one line of code, totally changes how we traverse the maze.
37:59 And so maze traversal is an awesome way of kind of illustrating the difference between depth-first search and breadth-first search.
38:05 Yeah, and then so you have the queue, the depth-first one?
38:09 No, in the breadth-first one.
38:10 And then you come to A-star, which uses a new data structure, but is also pretty similar in the implementation called a priority queue.
38:19 Right.
38:19 I don't believe I've seen a priority queue.
38:21 Yeah, so this is a place where the Python standard library is pretty cool because the Python standard library has built-in facilities for handling heaps.
38:29 And a heap is a data structure that's at the basis of a priority queue.
38:33 What a priority queue does is instead of a stack where we say last thing in, take that thing out first, and a queue where we say first thing in, take that thing out first, with a priority queue we say, let's assign every item we put into the data structure a quote-unquote priority.
38:48 And when we take things out of the data structure, let's take them out by their priority.
38:54 So the first thing we take out might be the thing with the highest priority.
38:58 So we have a way of kind of tagging each element and saying, here's your priority.
39:02 And then when we pull them out, we're going to efficiently pull them out by their priority.
39:06 So in A-star search, what we want to have is some kind of heuristic for traversing the maze.
39:11 Instead of just dumbly saying, oh, what are the next things next to where I am right now?
39:16 Let's instead say, looking at my next possibilities, which one do I think will get me closest to the goal?
39:23 And the way we usually do that on a maze is actually just using a straight line.
39:29 So sometimes called Euclidean distance.
39:31 Just draw a line from the next places I'm looking at to the goal.
39:34 And the shortest line, that's probably the one that's getting me closest to where I want to go.
39:39 And so we can then put all of those next nodes into the priority queue ordered by how long is that line?
39:47 And how long that line is, is what the priority is.
39:50 And we just take the ones out that have the shortest lines next.
39:52 And that gets us to an algorithm that will actually get us very efficiently from the start to the goal.
39:59 This portion of Talk Python to Me is brought to you by Microsoft.
40:02 For ultimate developer productivity in the cloud, use Azure extensions for Visual Studio Code.
40:07 You can deploy and debug your serverless Python apps directly from your editor.
40:11 On Azure, you can run your Python apps as serverless code on Linux web apps and functions,
40:16 or on top of managed Kubernetes, and easily connect data from database services, including Postgres and MySQL.
40:23 You can also use Azure DevOps to create cross-platform builds of your Python packages with hosted macOS, Linux, and Windows build machines.
40:31 And publish them to Azure Artifacts for your own hosted PyPI feed.
40:35 Azure DevOps is free for open source projects, and many are using it already.
40:39 Get started for free at talkpython.fm/Microsoft.
40:44 You also mentioned this idea of the Manhattan distant metric, where you can't actually walk diagonal on a maze, where it's a grid, or something like that.
40:54 Right?
40:54 And you also can't do that in cities where they're laid out on grids.
40:57 Right?
40:57 There is no diagonal travel.
40:58 It's over one, back one.
41:00 So, like, the diagonal is effectively as long as going, you know, around the border on these things.
41:05 So, you can also even get a little more accurate with that.
41:07 Right?
41:07 Right.
41:07 So, yeah, the straight line is always the shortest way to get from point A to point B, but straight lines don't always work.
41:14 Right?
41:14 So, if we're on a grid where we can only go horizontally and vertically from one spot to another, we actually can use a different heuristic known as the Manhattan distance, which is named after the borough of Manhattan of New York City, where all the streets are on a grid pattern.
41:28 Right?
41:28 And so, when you want to walk, you have to walk a certain number of blocks to the right or left, and a certain number of blocks up or down.
41:35 And if you actually do that calculation on a maze that can only go horizontally and vertically, it's going to give you a better estimation of how long it'll take to get from some location to another location than the straight line will, because it's closer to how you can actually move.
41:54 Yeah.
41:54 So, yeah, what I thought was really interesting about the coming together of all of these three, well, the three ways of solving this maze problem, were you have basically the same algorithm, and it's really about the data structure that makes it all work.
42:09 And that was pretty cool.
42:10 Yeah.
42:10 A lot of people have said that's one of their biggest aha moments throughout the book, is where we can go and just change the data structure, and the whole solution, how it actually works, changes, even though the rest of the code doesn't.
42:21 Yeah, it's super cool.
42:22 Another area that was pretty interesting was the whole set of constraint problems.
42:26 So, you have a couple in here that we could touch on, the Australian map coloring, the sin plus more equals money, maybe some more realistic problems after that.
42:35 Yeah.
42:36 What's cool about those problems is how we can think about them all together as really one problem-solving technique.
42:41 So, let me tell you a little bit about the problem.
42:43 So, the Australian map coloring problem is one where we take all the regions of Australia, and we want to fill each of them in with a color, and we only want to use three colors, let's say red, green, and blue.
42:53 So, is it possible to find a color that we can put on each region without any of them next to each other having the same color?
43:02 And the answer is, of course, yes.
43:03 We see this on maps all the time, right?
43:05 So, how can we solve this with a computer?
43:08 Well, one way of thinking about the problem is as a constraint satisfaction problem.
43:13 And what a constraint satisfaction problem is, is it has three parts to it.
43:17 It has variables.
43:17 In the case of this problem, it would be the regions.
43:20 So, the regions of Australia are our variables.
43:22 Then a constraint satisfaction problem always has domains.
43:25 Those are what can we fill in for each of the variables.
43:28 In this case, that would be red, green, or blue.
43:30 That's the domain of each of those variables.
43:32 Each of them has to be one of those values for our solution.
43:35 And then the last part of constraint satisfaction problem is the constraints.
43:39 In the case of this problem, the constraint is that any two regions next to each other can't have the same color.
43:45 So, we actually have a bunch of constraints.
43:47 We have one for every single set of regions that are next to each other.
43:50 So, a constraint satisfaction problem solver can solve this problem, but it can also solve any other kind of problem that we can define using variables, domains, and constraints.
44:02 And so, we can write a constraint satisfaction problem solver just once,
44:07 and then we can use it for other kinds of problems besides just the Australian map coloring problem,
44:11 such as the one that you mentioned, SEM plus more equals money.
44:15 That's a crypt arithmetic problem.
44:16 It says...
44:17 Those seem so different.
44:18 Yeah.
44:18 And yet, the same problem technique or solution will hit them both, right?
44:24 So, yeah, tell us about that one.
44:25 It's really interesting.
44:26 Yeah, it's totally different.
44:27 SEM plus more equals money is what we call a crypt arithmetic problem.
44:30 If you line up the letters S-E-N-D and the letters M-O-R-E, and you said each of these letters is going to represent a digit.
44:38 So, maybe E is four and, you know, maybe N is five or something like that.
44:43 And then underneath them, as if it was an arithmetic problem, we put the letters M-O-N-E-Y, money, the word money.
44:51 And we also say that each of those letters has to be a corresponding digit.
44:54 What digits can we fill in for each of those letters that will lead to SEM plus more equaling money?
45:00 Well, if you think about it, it actually is a constraint satisfaction problem again.
45:05 Here's why.
45:05 Each of the letters is our variable.
45:08 So, S might be a variable.
45:09 E might be a variable.
45:10 N might be a variable.
45:11 Each of the possible values of those letters is our domain.
45:15 So, the digits, zero through nine, are the domain of each of those letters.
45:19 And here's our constraint.
45:20 Our constraint is that for whatever digits I fill in for each of those letters,
45:24 the combination of SEM plus more must equal money.
45:28 So, again, it's actually a constraint satisfaction problem.
45:30 But when you first look at it, you might be like, wow, this seems totally different from the map coloring problem.
45:34 But actually, they're the same kind of problem.
45:36 Yeah, that's pretty amazing.
45:38 And once you lay it out like that, it's obvious.
45:40 But I would never look at those two things and go, yeah, that's the same problem.
45:43 Right, absolutely.
45:44 So, that's one of the things I like about studying these classic problems,
45:48 is they show you this larger symmetry or commonality across different, what seem like different problems.
45:56 And like, oh, it's a recursion problem.
45:58 It's a, you know, one of these constraint problems.
46:01 Or it's a graph problem or something like that.
46:03 Yeah, it's amazing how many different ways we can apply the same problem solving techniques.
46:07 So, when you're talking about these techniques, you know, like these constraint algorithms are not
46:12 particularly new.
46:13 But you did talk about one being, coming along that is actually new, something called constraint
46:19 propagation.
46:19 What's the story there?
46:21 Well, I wouldn't say it's new, but it's a more advanced way of solving these constraint
46:25 satisfaction problems.
46:26 So, in the chapter, we teach you a way called backtracking search.
46:30 And it's actually a type of depth first search.
46:32 So, it kind of builds on, this is chapter three of the book, it kind of builds on chapter
46:35 two's explanation of depth first search.
46:37 And in backtracking search, it's every time you hit a wall.
46:41 So, every time we found a way of filling in domains for these variables that solves the
46:46 constraints, but then it doesn't solve one of them.
46:49 We go backwards.
46:50 And we try to find a different fill-in that doesn't lead us into that wall.
46:55 In constraint propagation, we instead ahead of time try to figure out, okay, I think this
47:01 is going to work well.
47:02 And we might still do a backtracking search at the end to make sure it really does work
47:06 well.
47:06 But instead of doing it after we've already hit a wall, we try to figure out what's going
47:11 to work before we hit the wall at the very beginning.
47:13 We don't actually get into the details of how that works in the book.
47:16 It's kind of more of an advanced technique.
47:18 But it's a little more efficient.
47:20 And so, if you were going to solve this on a really big problem, so a problem that, let's
47:25 say, had hundreds of variables and hundreds of domains and constraints, then maybe you would
47:30 want to use something more advanced than the technique we teach in the book.
47:33 What are some of the real world places where these constraint problems show up?
47:37 I would say the most common is probably scheduling.
47:39 So, if you think about a bunch of people want to meet and they have all of them at different
47:44 availabilities, we could think about the individual people that want to meet as the
47:48 variables, what their availabilities are as their domains, and who needs to be at that
47:53 meeting as the constraints.
47:55 And there's actually real life schedulers that use constraint satisfaction techniques to do
48:02 scheduling.
48:02 I can't tell you for sure if that's what Google Calendar uses, for example, but it wouldn't
48:06 surprise me.
48:07 Yeah, of course.
48:07 So, very powerful.
48:08 And I think the idea of the common technique across non-obvious things is really great with
48:13 that set of problems.
48:14 Another one are graph problems.
48:17 Not parabola, but vertices and edges types of graph theory problems, right?
48:24 Yeah.
48:24 And I think this is an area that people who have a math degree or a CS degree are totally
48:30 familiar with.
48:30 And sometimes people who are self-taught programmers don't have any experience with.
48:35 And it's, again, it's something that is applicable to actually a wide range of different problems.
48:40 And you might not even realize it when you're first seeing it.
48:43 Because what is a graph?
48:45 Well, a graph is technically a set of vertices that are connected to each other by edges.
48:50 And it's really an abstract math concept.
48:52 But in the real world, we actually see them all the time.
48:55 So anytime you see a network, that's basically a graph.
48:59 For example, the subway in New York City, that's a graph.
49:02 Every one of the stations is a vertice.
49:04 And edges are the tunnels that connect each of those stations.
49:08 If you want to lay out electric wire throughout a neighborhood, that's a graph.
49:12 Each of the houses you need to hook up, that's a vertice.
49:14 And the line you're going to run from one house to another or to the poles, that's an edge.
49:19 So we can actually use this concept of just having vertices and edges to represent a huge number
49:25 of different problems.
49:26 And what we teach in the chapter is a bunch of different techniques for solving common problems
49:32 you'd fall into when you want to use these graphs, such as how do I find the shortest path
49:37 from one vertice to another?
49:39 Or to put it in the context of what we just mentioned, how do I find the shortest path from one subway station to another subway station?
49:46 Or how do I use the minimum amount of edges of certain lengths to connect all the different
49:54 vertices in the context of the electric wire?
49:56 How do I use as little electric wire as possible to electrify this whole neighborhood?
50:02 And that's actually the problem that originally inspired some of these algorithms in Czech Republic.
50:07 They were trying to electrify large parts of Czech Republic.
50:10 And some mathematicians in Czech Republic in the 1920s and 1930s came up with the algorithms
50:15 for finding the least amount of electric wire to electrify the country.
50:19 That's a pretty cool bit of history.
50:20 I didn't know that.
50:21 I do like you in there when you're presenting the problem.
50:24 You talk about the hyperloop and some of the interesting things you might consider.
50:29 So you have a bunch of connections between the major cities.
50:32 And you imagine those are hyperloop, Elon Musk, high speed tube things, connecting them.
50:38 And you said, look, here's the map of the US and here's the cities and maybe the tracks.
50:43 But throw away the map.
50:44 Just think of these as connections and the points.
50:47 And then how do you get from one point to the other?
50:50 You also said some interesting things like, you know, probably it's longer to make a connection
50:54 at a city than it is to travel between two cities, given that those things are supposed
50:58 to travel, you know, supersonic effectively.
51:01 Right.
51:02 So that completely changes the way you think about solving the problem, right?
51:05 Right.
51:06 I mean, the hyperloop is supposed to be, for people that don't know, this super fast way
51:09 of going 700 miles per hour in some tubes underground, right?
51:13 If you're going 700 miles per hour, what do you care about more?
51:16 How long the routes are?
51:17 Do you care like that it's a thousand miles from one to another?
51:20 Or do you care how many stops you're going to have to make along the way?
51:24 Because it actually might be every time you make a stop, there's a little layover.
51:28 And that will actually end up taking more time, even if the other route is a longer distance.
51:33 So we use the hyperloop as a motivating problem to go through a bunch of these different graph
51:38 algorithms.
51:39 And one thing we find is that the distance that it takes might not matter as much as how
51:45 many stops we make along the way for something like the hyperloop.
51:49 And so there's a different algorithm you'd use for how many stops you want to make versus
51:53 the route that might have the shortest distance.
51:56 Right.
51:56 Just to talk about things like minimum spanning trees, where like maybe what is the minimum
52:01 amount of money or track or tube you can lay to actually just connect these things,
52:05 regardless of what sufficient, right?
52:08 Right.
52:08 So if we think about every major city in the United States and we want to connect them
52:12 all, right?
52:12 How do you actually draw connections between all of them that would use the minimum amount
52:18 of money to build, let's say, because it's going to be the minimum amount of track that you
52:23 would have to lay?
52:24 You know, as a human being, we can solve that pretty easily if there's like five cities,
52:29 right?
52:29 We may be even able to solve that if there's 10 cities.
52:32 When you start thinking about 15 or 20 or 30, you need an efficient way for a computer to
52:37 do that.
52:38 And finding the minimum spanning tree is what we call those algorithms.
52:41 Yeah.
52:42 It's pretty cool.
52:42 Another one, another category of problems that you talked about, and they seem to have a lot
52:47 to do with games like board games and stuff, were the adversarial search problems.
52:52 So these apply to zero sum, two player, perfect information games, unlike say a poker or something
52:59 where you don't know what the other player has or can play, right?
53:01 Right.
53:02 So a very common thing that people do when they're learning programming is programming tic-tac-toe.
53:07 It's just, for some reason, it's just something people love to do.
53:10 Super easy to program tic-tac-toe when you're just saying, okay, player one is going to put X
53:15 somewhere on the board and player two is going to put O somewhere on the board.
53:18 That's easy.
53:19 But what if you want the computer to play tic-tac-toe perfectly?
53:22 So your first instinct might be, oh, so I'll just hard code that every time there's two
53:29 things next to each other, have the computer do a block.
53:32 Or every time there are the boards totally open, have the computer play center.
53:37 But there's actually an algorithm called Minimax that instead of having to hard code every possible
53:42 position, we can just have a way of figuring out what the best position is all the time by
53:48 searching through what the result will be of the opponent's move to the computer's move.
53:55 And then the result of the best move I can make to the opponent's move back and forth,
53:59 back and forth till I get to the end of the game.
54:01 Or sometimes in Minimax, like in a much more complex game than tic-tac-toe, something like
54:06 chess or checkers, we won't get to the end of the game, but we'll get, let's say, five or
54:10 six moves into the future.
54:11 And we can then just see how good is the position at that point.
54:14 So Minimax is an algorithm that we can use for any game that's basically two players that
54:21 we can actually look at the whole board at once.
54:24 And that's what we call perfect information, being able to see all the information about
54:28 the game at any given time and have a computer play it really, really well.
54:32 And this is the way that chess engines work, for example.
54:35 Most of them use Minimax or more advanced variants of it that allow them to play better than any
54:40 grandmaster in the world.
54:41 And on a modern computer, you can program yourself a chess engine that'll be any of your friends.
54:47 Easily.
54:49 It's not that hard.
54:50 Yeah.
54:50 Yeah.
54:51 You get a little IoT and a camera and some computer vision.
54:54 They just take it to the tournament and hold it above, right?
54:56 Yeah, exactly.
54:57 Yeah.
54:58 Yeah.
54:58 It's pretty cool.
54:59 That's a nice one.
55:00 Now, I saved the craziest ones to ask you about for last because I looked through the
55:05 examples and it still is kind of sketchy in my mind, but it's very interesting.
55:09 That's the whole category of genetic algorithms where you don't really know the solution, but you
55:15 turn loose a bunch of different algorithms and then they tweak themselves.
55:20 And they evolve to try to find the solution.
55:23 Tell us about that.
55:24 Yeah.
55:24 So with genetic algorithms, we don't usually know the way to solve the problem.
55:29 So it might be a problem that's so hard that we don't know any way to efficiently come up
55:34 with a solution.
55:35 And so instead, we borrow this idea from nature of trying a lot of different solutions and slightly
55:42 modifying them when we think they're getting closer to the actual solution.
55:47 So in the real world, right?
55:49 We have evolution and natural selection, right?
55:52 You have some species comes up and it's really well adapted to its environment.
55:57 So it has more children.
55:58 And then some other modification to it might actually make it less adapted.
56:03 So it has less children.
56:05 And the species that's better adapted is the one that tends to survive over time, right?
56:09 So it's just applying that same idea to solutions to a problem on a computer.
56:13 We're saying, oh, I'm going to maybe at first even randomly try a bunch of random different
56:17 solutions.
56:18 The ones that seem to work better, oh, I'll let those solutions propagate.
56:22 I'll modify them slightly.
56:24 So it's like they're having children.
56:26 And I'll let those children be the progenitors of the next generation of possible solutions.
56:30 And the ones that didn't seem to work so well, well, I'm going to let them die.
56:34 Unfortunately, that's how natural selection works.
56:36 And so I'll keep hopefully slowly building by making small random modifications closer to
56:42 an actual solution to the problem.
56:44 Now, doing anything randomly is never probably going to be your first reach for solution to
56:50 a problem.
56:51 When you do something randomly, there's a strong possibility it'll take a really long
56:55 time to really get to a good solution.
56:58 So even if you're climbing in the right direction, as we do in genetic algorithms, right?
57:03 So you only want to use genetic algorithms when you really can't think of any other way to
57:08 approach this problem.
57:09 But there are some situations where they work pretty well because we don't know any better
57:13 way to really solve the problem.
57:15 Sure.
57:15 So I understand the idea of letting it evolve, like the algorithm evolve.
57:21 You tweak it in a little way.
57:22 But it seems to me like you almost have to give it the algorithm a fixed structure, right?
57:29 You're going to try to do this with these seven, like almost constants or constraints that you're
57:35 going to apply.
57:36 But you're always going to like try to fit this type of calculation or this type of thing
57:41 to it.
57:41 Like how much variation can you actually get as these evolve?
57:45 You're absolutely right.
57:46 You do have to kind of set some constraints at the front.
57:48 So for example, if I'm trying to find the solution to a math problem, right?
57:54 And I have X and Y and X and Y, I need to know what are the right numbers that should fill in for X
58:00 and Y to solve the math problem, right?
58:02 It would be really helpful if instead of generating any random numbers at the very start, I put them
58:08 within a reasonable range.
58:09 Like if I know just before I start the problem that probably X and Y are going to be between
58:14 zero and 100, it would be good if I just generated random X and Ys to start out with that are only
58:19 between zero and 100.
58:20 Because otherwise I could just go way off in the left field and never get anywhere close to
58:25 my solution.
58:25 So you're absolutely right.
58:26 You have to have some knowledge of the problem to set it up appropriately for a genetic algorithm.
58:31 Right.
58:32 So it's not, well, is it possible?
58:33 I guess let me just ask you that way.
58:35 That, you know, I start up with an algorithm that says we're going to try to use this equation to
58:40 match some kind of answer and it evolves itself into using a neural network instead.
58:45 Oh.
58:45 It can't make that much of a change, can it?
58:47 No, it can't make, you know, it's interesting you mentioned that, you know, we do have a chapter
58:52 in the book on neural networks.
58:53 Neural networks are the hardest technique we go over in the book.
58:56 If you want to do something even harder, I have a colleague at work and he works on using
59:00 genetic algorithms with neural networks.
59:02 Okay.
59:03 That's pretty interesting.
59:05 Yeah.
59:05 Trying to evolve the neural networks to more accurately solve a certain problem.
59:10 So yes, it is possible to combine the two techniques, but your listeners should not worry
59:15 that if you just program a genetic algorithm, it will not morph itself into a neural network.
59:19 There may be a singularity, but probably not.
59:22 Not yet.
59:22 Not for that path.
59:23 Yeah.
59:23 Not that way.
59:25 No.
59:25 That's cool.
59:26 So I guess, you know, that covers many of the problems you laid out.
59:30 There are some other popular ones like the knapsack and traveling salesman problems and
59:34 some others we didn't cover as well.
59:35 But there's a lot of cool techniques that I think are worth studying here.
59:39 Which one is your favorite?
59:40 So I would say that I actually like the graph problems best because we're doing something there
59:46 that is motivated by a real world problem, which is building out a transportation network
59:51 throughout the whole chapter.
59:53 One thing that people might be a little suspicious of is that the book is full of toy problems,
59:59 right?
59:59 And that is true.
01:00:00 A lot of the problems in the book are problems that are a little fanciful.
01:00:05 They're not necessarily something that you're going to come across on a day-to-day basis because
01:00:10 we have to have problems small enough that you can actually learn all of them in a fairly
01:00:15 short chapter, right?
01:00:16 We can't get into all the details of a really sophisticated problem, unfortunately, in a book
01:00:21 of this length and this broad.
01:00:24 So a lot of them are kind of toy problems.
01:00:25 It's more about learning the techniques that you use on these toy problems and how they
01:00:29 might apply to your real world problems.
01:00:31 In the graph problem chapter, I think right away you can see how that problem is similar
01:00:35 to a lot of real world problems.
01:00:37 All the time, we're trying to figure out what's the most efficient way to traverse some kind
01:00:41 of network.
01:00:42 If you're working in even API plumbing, you might have a bunch of nodes that you need
01:00:47 to hit.
01:00:48 And what's the most efficient way to traverse hitting all those nodes in a certain order?
01:00:52 That lends itself maybe to a graph problem.
01:00:55 So the graph problem chapter, I really love just because it's so immediately applicable.
01:00:59 Yeah, that's a good one.
01:01:00 So one of the things that was evident to me was that choosing the right data structure was
01:01:06 really important and learning the algorithms is good.
01:01:10 But I think also knowing the data structures, maybe it's even more important in some ways.
01:01:14 So for people who maybe are self-taught, what are some of the most important data structures?
01:01:19 And where do you think they might have holes that they don't realize are blind spots that
01:01:24 they should go check out?
01:01:25 Well, if you're programming in Python, you got to be really, really familiar with lists and
01:01:29 with dictionaries.
01:01:30 I mean, if you're not at least an expert in those two, you're not using Python to its full
01:01:35 extent because they are just so all over the language and all over the standard
01:01:40 library.
01:01:40 So you should at a minimum be familiar with lists and dictionaries.
01:01:44 Now, then there's more abstract data structures that are built on top of those.
01:01:48 Everyone should be familiar with stacks and queues, which are built on top of lists.
01:01:52 And again, those are, like we mentioned earlier, structures that the first thing I put into them
01:01:56 might be the first thing I get out in the case of a queue, or the last thing I put into
01:02:00 them is the first thing I take out in the case of a stack.
01:02:04 Everyone should be familiar with those two.
01:02:06 Then we get into some that are a little bit more esoteric, like a priority queue.
01:02:10 They're really useful to know about in certain applications or certain algorithms that are
01:02:15 pretty useful algorithms like A-star, which we talked about earlier, that you can't implement
01:02:19 without them.
01:02:20 So you might not use them every day, but there are situations where it's good to be at least
01:02:25 familiar with them.
01:02:26 We don't do a lot of work in the book with trees, which some people might find surprising.
01:02:31 But I'd say that's a common hole that people have is not being familiar with how to build
01:02:36 a tree and how to structure a tree.
01:02:37 And, you know, one thing people use this book for sometimes I've heard in the Swift book as
01:02:42 well is preparing for interviews.
01:02:44 Unfortunately, I think unfortunately, a lot of programming interviews have become very kind
01:02:49 of algorithmic problem solving.
01:02:50 I think knowing all these data structures will help you a lot when doing those interviews.
01:02:55 And so I'd recommend everyone be familiar with stacks, queues, priority queues, trees, of
01:03:01 course, lists and dictionaries.
01:03:03 Everyone who programs in Python should be an expert on dictionaries.
01:03:05 I'm sure you'd agree.
01:03:06 I do.
01:03:06 I definitely agree that, you know, the dictionaries especially show up not just where you use them,
01:03:12 but they also show up as foundational things, right?
01:03:15 Like for the backing, holding all the fields and attributes of a class, for example.
01:03:19 So super, super important.
01:03:20 Yeah.
01:03:21 So I guess one other question along those lines is people are excited about this topic.
01:03:26 Maybe they get your book.
01:03:27 Where else could they go?
01:03:28 I mean, are there good MOOCs like Carnegie Mellon or something like that?
01:03:33 Or where should they go?
01:03:34 I'll point out a few different resources.
01:03:35 So first of all, I should say this book is not a textbook.
01:03:39 So it doesn't teach you a lot of the math behind these algorithms.
01:03:42 It doesn't teach you big O notation.
01:03:44 It uses big O notation a little bit, but it doesn't really go into a lot about it.
01:03:48 So we really wanted to make it a really approachable book.
01:03:51 So we kept the math to a minimum.
01:03:52 We even tried to keep the math to a minimum in chapters like the neural networks chapter
01:03:56 and the k-means clustering chapter.
01:03:58 These more kind of AI chapters, machine learning chapters, because we wanted the book to be approachable.
01:04:03 That said, that means that we're not really giving you the full story.
01:04:07 And so if you really want to get the full story, there's a couple textbooks I'd recommend.
01:04:11 One is Algorithms by Sedgwick and Wayne.
01:04:14 That's a really common college textbook on algorithms and data structures.
01:04:19 And it is probably the most approachable, in my opinion, of the textbooks.
01:04:23 It's really well written.
01:04:24 Unfortunately, it's in Java, not in Python.
01:04:26 So that kind of sucks.
01:04:28 But it's still a really easy, readable textbook.
01:04:31 The classic textbook in this area is called Introduction to Algorithms by Corman and his
01:04:36 co-authors, CLRS, it's sometimes called.
01:04:39 That is a really good textbook.
01:04:41 It's more reference quality material.
01:04:43 Some people say it's a little bit less approachable than just the Algorithms book by Sedgwick.
01:04:48 In terms of online resources, strongly would recommend the Algorithms course on Khan Academy.
01:04:54 Again, unfortunately, it's in JavaScript, not in Python.
01:04:57 But having looked at a bunch of these resources, I really think for new learners, that's a really
01:05:01 strong course on Khan Academy.
01:05:03 If you want to get more into the machine learning type of stuff we talked about in the book, like
01:05:07 K-means clustering and neural networks, then the machine learning course on Coursera taught
01:05:12 by Andrew Ng has become kind of a standard that a lot of people go through.
01:05:16 And it's very high quality.
01:05:19 And unfortunately, again, it's in MATLAB slash Octave instead of Python.
01:05:24 I don't think there are a couple Python textbooks and data structures and algorithms.
01:05:29 None of them have become, let's say, classics.
01:05:32 So I can't individually recommend any of them.
01:05:35 I'm also not immediately familiar with all of them.
01:05:38 But I do think most of these resources for anyone who's been doing a year or two of Python,
01:05:43 you'd be quickly able to read most of the ones that we mentioned.
01:05:47 Right.
01:05:47 Okay.
01:05:47 Well, that sounds great.
01:05:48 Well, Dave, I think we're getting pretty low on our time.
01:05:51 So we've used it about up and talking about all these great things.
01:05:54 So I guess we'll have to leave it here.
01:05:55 But let me ask you the two questions before we call it a show.
01:05:59 If you're going to write some Python code, what editor do you use?
01:06:03 I'm all about PyCharm now.
01:06:04 Big fan.
01:06:05 Right on.
01:06:05 Me too.
01:06:06 And then notable PyPI package, even though, ironically, you didn't really use many of them
01:06:11 in your examples.
01:06:12 What's maybe one folks haven't heard of that they should know about?
01:06:15 Well, I don't know.
01:06:16 People are familiar with it, I'm sure.
01:06:17 Scikit-learn.
01:06:18 I think Scikit-learn just has so much functionality built in.
01:06:21 And let me just add, you know, the purpose of reading a book like this is not that you're
01:06:26 going to go implement all of these techniques yourself.
01:06:28 It's so that you're going to know which open source library to include in your project.
01:06:33 We don't expect you to go implement Scikit-learn yourself.
01:06:36 But when you use Scikit-learn, which of the algorithms from it should you be using?
01:06:39 Yeah, it makes a lot of sense.
01:06:40 Yeah.
01:06:41 You don't want to go and reinvent 10 years of lots of work, right?
01:06:45 It's like reinventing NumPy.
01:06:46 You wouldn't do it, but understanding what's happening is really good.
01:06:48 Exactly.
01:06:48 Yeah.
01:06:49 Cool.
01:06:49 All right.
01:06:50 Well, final call to action.
01:06:51 People want to get more into learning these algorithms and data structures and these classic
01:06:55 problems.
01:06:56 What do they do?
01:06:56 Well, you can get the book from Manning.
01:06:58 Manning's a publisher, manning.com.
01:07:00 And I think you're going to be offering a promo code for your listeners.
01:07:02 It's called Classic Computer Science Problems in Python.
01:07:05 It's also going to be available on Amazon starting at the beginning of April.
01:07:08 Okay.
01:07:09 Sounds great.
01:07:10 Well, Dave, thank you for being on the show.
01:07:13 It was great to talk to you about all these things.
01:07:14 There's a bunch of cool stuff you put together here.
01:07:16 Thanks a lot for having me, Michael.
01:07:17 It was a real pleasure.
01:07:18 You bet.
01:07:18 Bye.
01:07:19 This has been another episode of Talk Python to Me.
01:07:23 Our guest on this episode was David Kopech, and it's been brought to you by Microsoft.
01:07:27 If you're a Python developer, Microsoft has you covered.
01:07:31 From VS Code and their modern editor plugins, to Azure Pipelines for continuous integration,
01:07:36 and serverless Python functions on Azure.
01:07:38 Check them out at talkpython.fm/Microsoft.
01:07:42 Want to level up your Python?
01:07:45 If you're just getting started, try my Python Jumpstart by Building 10 Apps course.
01:07:49 Or if you're looking for something more advanced, check out our new async course that digs into
01:07:54 all the different types of async programming you can do in Python.
01:07:57 And of course, if you're interested in more than one of these, be sure to check out our
01:08:01 Everything Bundle.
01:08:02 It's like a subscription that never expires.
01:08:04 Be sure to subscribe to the show.
01:08:06 Open your favorite podcatcher and search for Python.
01:08:09 We should be right at the top.
01:08:10 You can also find the iTunes feed at /itunes, the Google Play feed at /play,
01:08:15 and the direct RSS feed at /rss on talkpython.fm.
01:08:19 This is your host, Michael Kennedy.
01:08:21 Thanks so much for listening.
01:08:22 I really appreciate it.
01:08:24 Now get out there and write some Python code.
01:08:25 I'll see you next time.
01:08:26 Bye.
01:08:26 Bye.
01:08:26 Bye.
01:08:26 Bye.
01:08:27 Bye.
01:08:27 Bye.
01:08:27 Bye.
01:08:27 Bye.
01:08:27 Bye.
01:08:27 Bye.
01:08:27 Bye.
01:08:27 Bye.
01:08:27 Bye.
01:08:27 Bye.
01:08:28 Bye.
01:08:28 Bye.
01:08:28 Bye.
01:08:28 Bye.
01:08:28 Bye.
01:08:28 Bye.
01:08:28 Bye.
01:08:28 Bye.
01:08:28 Bye.
01:08:29 Bye.
01:08:29 Bye.
01:08:29 Bye.
01:08:29 Bye.
01:08:29 Bye.
01:08:29 Bye.
01:08:29 Bye.
01:08:30 Bye.
01:08:30 Bye.
01:08:30 Bye.
01:08:30 Bye.
01:08:30 Bye.
01:08:30 Bye.
01:08:30 Bye.
01:08:30 Bye.
01:08:30 Bye.
01:08:30 Bye.
01:08:30 Bye.
01:08:30 Bye.
01:08:30 Bye.
01:08:30 Bye.
01:08:31 Bye.
01:08:31 Bye.
01:08:31 Bye.
01:08:31 Bye.
01:08:32 Bye.
01:08:32 Bye.
01:08:32 Bye.
01:08:32 Bye.
01:08:32 Bye.
01:08:32 Bye.
01:08:32 Bye.
01:08:32 Bye.
01:08:32 Bye.
01:08:32 Bye.
01:08:32 Bye.
01:08:32 Bye.
01:08:33 Bye.
01:08:33 Bye.
01:08:33 Bye.
01:08:33 Bye.
01:08:34 Bye.
01:08:34 Bye.
01:08:34 Bye.
01:08:34 Bye.
01:08:34 Bye.
01:08:34 Bye.
01:08:34 Bye.
01:08:34 Bye.
01:08:34 Bye.
01:08:35 Bye.
01:08:35 Bye.
01:08:35 Bye.
01:08:35 Bye.
01:08:36 Bye.
01:08:36 Bye.
01:08:36 Bye.
01:08:36 Bye.
01:08:36 Bye.
01:08:36 Bye.
01:08:36 Bye.
01:08:37 Bye.
01:08:37 Bye.
01:08:37 Bye.
01:08:37 Bye.
01:08:37 Bye.
01:08:37 Bye.
01:08:37 Bye.
01:08:38 Bye.
01:08:38 Bye.
01:08:38 Bye.
01:08:39 Bye.
01:08:39 Bye.
01:08:39 Bye.
01:08:39 Bye.
01:08:39 Bye.
01:08:39 Bye.
01:08:39 Bye.
01:08:39 Bye.
01:08:40 Bye.
01:08:40 Bye.
01:08:40 Bye.
01:08:41 Bye.
01:08:41 Bye.
01:08:41 Bye.
01:08:41 Bye.
01:08:41 Bye.
01:08:41 Bye.
01:08:42 you you you Thank you.
01:08:45 Thank you.