Monitor performance issues & errors in your code

#211: Classic CS problems in Python Transcript

Recorded on Monday, Mar 4, 2019.

00:00 Michael Kennedy: Many of you studied computer science at a university to get into programming and your careers. I bet most of you came through some self-study or some sort of backdoor into the industry. I count myself among that crowd. This is one of the true bright spots of our industry, that we can earn our way in without necessarily going and getting formal college degrees. But sometimes that academic formalism would come in handy. That's where David Kopec's book is a great resource. It's an approachable and quick introduction to computer science, and that's our topic on this episode. This is Talk Python to Me, Episode 211 recorded March third, 2019. Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm. And follow the show on Twitter via @talkpython. This episode is brought to you by Microsoft. Be sure to check out what they're offering during their segments. It really helps support the show. Hey, everyone. Before we get to our interview with David, I want to share some super exciting news. You've probably heard about our 100 Days of Code with Python course. Well, Bob, Julian, and I are back with another 100 Days of Code course. This time, it's all about the web. We've created a course called 100 Days of Web in Python. And it's a similar style course, but it covers so many of the web technologies that you might be interested in. Want to learn about Async? That's in there. Want to learn about Django? It's in there. How about API Star, Flask, Pyramid, SQLAlchemy, database migrations, on and on and on? Service functions, all that stuff, it's in there. So, if you're interested in the web and you want to really get started, you owe it to yourself to check out the 100 Days of Web in Python. Just visit training.talkpython.fm and you'll find it right in the course section. Now let's talk to David. Dave, welcome to Talk Python.

02:07 David Kopec: Hi, Michael. It's a pleasure to be here.

02:09 Michael Kennedy: Hi, it's great to have you here. You've written a really cool book and you're talking about some great problems in computer science that I think will be beneficial to almost everyone listening regardless of their experience level. So, I'm super excited to talk to you about that. Of course, before we do though, let's start with your story. How did you get into programming with Python?

02:27 David Kopec: Well, my dad was a computer science professor. So, I started programming when I was little kid. He taught me Basic. I first got into Python when I was in graduate school. So, I was doing my master's degree at Dartmouth. And all the classes were in Python. So, I thought wow, I better become an expert in Python. And I quickly got up to speed and really fell in love with the language. I loved it for its succinctness more than anything, how you can write a program in Python and it would really look similar to the pseudocode that you maybe originally saw in an algorithm.

02:57 Michael Kennedy: Yeah, I do think that's a really great aspect of it. What languages did you learn before? You said Basic in the early ages, but you probably learned something in between.

03:04 David Kopec: Basic in the early ages, then I was really into Java when I was a teenager, and then I got into Objective-C when I got interested in developing for the Mac. I've been all around the block, if you will. And then in graduate school, I got into Python.

03:17 Michael Kennedy: Yeah. Interesting. Objective-C is a different language as well. I'm just thinking coming from Java to Python, there's a lot of symbol differences. Did that strike as strange, or did you find it refreshing?

03:29 David Kopec: I found it refreshing. I found how succinct it is again really refreshing. I mean, Java has this notorious reputation for being verbose, so does Objective-C. A lot of people get scared of by Objective-C's very verbose syntax. So, Python was a breath of fresh air.

03:43 Michael Kennedy: I feel the same way. It's pretty awesome. Okay. So, what do you do today? You went to graduate school. You studied computer science. You're passing the baton to the younger generation?

03:52 David Kopec: Well, yeah, I worked as a software developer for a few years. Now, yeah, I'm an Assistant Professor of Computer Science. So, I'm kind of following in my dad's footsteps a little bit.

04:00 Michael Kennedy: Yeah, that sounds like a really fun job. You get all these new people who are excited about programming and work with them for a couple years and bring 'em along, right?

04:07 David Kopec: It's really exciting to see people having those aha moments when they first really grasp a concept. And so, yeah, it's a job that's really gratifying and a lot of fun.

04:15 Michael Kennedy: Yeah, it's cool. What's your favorite course to teach?

04:18 David Kopec: I like teaching iOS development. Maybe your Python listeners won't like me saying that, but that's one of my favorite classes to teach.

04:23 Michael Kennedy: Yeah. Well, it's just such an exciting platform. There's just so many inputs like there's GPS and altitude. Even with regular programming, a lot of times, you don't get that device feel, but with iOS and I guess Android as well, you do, right?

04:40 David Kopec: Yeah, absolutely. Another fun class is emerging languages. We kind of explore languages that are still on the periphery of industry and just becoming more popular. That's really exciting 'cause it keeps me fresh.

04:50 Michael Kennedy: Yeah, absolutely. What are some of the languages that are emerging?

04:52 David Kopec: So, this semester, we're doing Go, Swift, and Clojure. So, some people might be like oh those have already been around for a little bit. They're all around five to 10 years old. If you think about a computer science curriculum, it moves a lot slower than industry moves. So, these are languages that aren't usually taught in college or university level.

05:13 Michael Kennedy: Yeah, that's pretty cool that you're covering those. All right. So, let's talk about the main topic, these classic computer science problems. You wrote a couple of books around classic computer science problems. We'll touch on the books and more on the main ideas from them. But let's just start with what makes a computer science problem classic.

05:33 David Kopec: Well, for the purposes of the book, we thought about a classic problem as one that somebody who has a bachelor's degree of computer science would probably be familiar with. So, what had been taught in the college classrooms for the past 30 or 40 years as long as computer science programs have pretty much been around. So, if you go to somebody who has a CS degree, they've probably heard about one of these problems. Okay. So, that's our basis. But then we had to also extend that into a whole book. So, we also just went over problem-solving techniques that we think anyone who works in software development, whether they have a CS degree or not, should be familiar with. And then we're using problems to illustrate those classic problem-solving techniques.

06:16 Michael Kennedy: Yeah, I think that's a really great approach. We recently did a beginners and experts episode on the show, Time Shifting. It hasn't released yet. So, yeah, there's no way you could have heard it, but it's been recorded and will come out before this one. And we talked about what is the real, what are the core differences between somebody who's been programming for a long time and somebody who's new. It takes a couple of months maybe to jump in and learn a language or learn some important package like Django or SQLAlchemy or something like that. But it seemed like one of the really big challenges a lot of the beginners laid out is you see a problem, you see a whole bunch of APIs, like how do you even break it down into something you can start to solve. It's like partitioning problems into something you can think about. Do you think that's a big problem you see with your students?

07:06 David Kopec: Absolutely. I mean, if you're never even familiar with the problem-solving techniques, how are you going to know that you should be applying them? So, that's kind of the purpose of the book. It's very broad in its scope. That's been one of the criticisms of it frankly is that it doesn't go deep enough into any of the particular problem-solving techniques, but it's not supposed to. It's supposed to be a broad introduction, give you a little bit of a taste of each of them. And then you know enough about them to know when they should be applied, but you're not necessarily an expert in any of them just by reading the book.

07:33 Michael Kennedy: Yeah, that makes a lot of sense. One thing to think about, it feels to me like these classic problems regardless whether they're a new book or I think back to when I saw some of these, I don't have a CS degree, I have a math degree, but I have what's like roughly a minor in CS. My university didn't have CS or minors at all, really, but same idea. And so, when I saw these, it feels like the ones you covered as well a lot here, I think it's pretty typical, is it's very much focused on algorithms, data structures, and so on. It's not like here's how we're going to render a webpage or generate a data-backed REST endpoint or things like that. It's not about GUIs, and it's more focused on what is the core problem, what are the right algorithms and data structures and ways of thinking about it to solve it, right?

08:26 David Kopec: That's absolutely true. On the other hand, at the end of every chapter, we include a small section called real-world applications. It tries to bring them back into the day-to-day software development world. So, how are people in apps that you're probably familiar with actually using these techniques to solve problems? I think a lot of us as software developers, we get into a mode of kind of API plumbing where we end up kind of just hooking up other people's libraries, hooking up APIs together, and we get a little bit away from the problem-solving parts of software development. And for people who felt like they've been in kind of that API plumbing mode for too long, maybe they'll find this book a little refreshing.

09:04 Michael Kennedy: Yeah, I think they will as well. So, who do you think should study this kind of stuff? You talked about the API plumbing. Some people will make the distinction between coders and software developers. I'm not really sure I like that. But should everybody really understand these algorithms deeply? Who should really pay attention to this, you think?

09:24 David Kopec: I think anyone who's a software developer should be familiar with these problem-solving techniques. This book is actually geared for people who don't have a CS degree. It's to introduce those concepts that you missed out on by being a programmer and not having that CS education giving you a basis to understand what everyone else is talking about when they talk about something like a neural network or genetic algorithm. What is that stuff, and when should you actually be using a technique like that? So, I really think it's a broadly applicable set of topics that everyone should at least be familiar. Again it doesn't mean you need to be an expert, it doesn't mean you need to go do four years of study. But when you run into a problem in your software development day job or even if you're just doing software development as a hobby, you should know what tools are available to you to solve it.

10:07 Michael Kennedy: Right. Even simple stuff like oh that should be a dictionary or that should be a set versus a list.

10:12 David Kopec: Yeah, absolutely.

10:13 Michael Kennedy: Yeah, for sure. Okay. Well, one of the things I really liked about your presentation of all these classic computer science problems was one, it was in Python, but it was in modern Python, I mean, really modern Python, not like oh it was 3.5 or something. You really made proper use of a lot of the cool new features.

10:33 David Kopec: Yeah, that was a goal. We put together a team when we started the Python version of the book. And we said we need to have a team that's really familiar with the latest techniques because we want this book to hopefully be evergreen for a few years. We want people to feel like if they buy it two years from now, it's using fairly modern Python. So, we decided from the beginning we're going to do Python 3.7 and really cutting edge stuff and we're actually going to use features from Python 3.7. Things like data classes, things like recent additions to type hints. And I will say that especially the decision to use type hints was a very controversial one amongst the reviewers of the book. So, Manning has a pretty extensive review process. Every one of their books goes through three sessions of external expert reviewers looking at the material. And there were a couple reviewers every time who said, "You know what, I feel like the type hints are not standard Python. They're really a fad or they're really just something that's takes away from the succinctness of the language." which, like I mentioned before, something I love about the language, how succinct it is, and I kind of agreed. Obviously your code does look a little more bulky with all the type hints inside of it. On the other hand, we wanted to do where we thought the puck might be going and type hints are becoming more popular. They're definitely some value that they add arguably in terms of readability, knowing right away what a function is going to return is a nice thing without having to read through it. And we took a bit of a gamble on it. And I hope that for people who are interested in type hints, this book might be a gateway into getting even interest in it.

12:09 Michael Kennedy: I think so. I'm personally a big fan of type hints in Python. I think they can go overboard, I think they can make the code look too bulky, but there's something really powerful about just sprinkling a few of them here and there and then having your editor, your IDE, whatever just come to life knowing exactly what you're working with and even catching errors and mistakes. That's pretty powerful.

12:34 David Kopec: Yeah, absolutely. I mean, the ability to use something like Mypy and know ahead of time where some of your errors lie before you get to runtime is really great. When it comes to self-documenting your code, there's also a big advantage with type hints, which is you right away know what these parameters are supposed to be, you right away know what this function is supposed to return. It can make your comments a little less bulky. So, you're kind of trading off a little bit of bulkiness in the code for maybe a little bit of less need to be so explicit in your comments.

13:05 Michael Kennedy: Right, yeah. If you say something is a boolean, you don't need to describe in the comments, "Hey, this is a boolean." It's like, well, :bool, we're good.

13:11 David Kopec: Exactly.

13:12 Michael Kennedy: Yeah. Yeah, it's pretty interesting. So, reading through your examples though, when I think of like why do I care about type hints, I care about them mostly for the reason like personally I care about them, for the reason that I laid out, which is I love how it makes the editors like PyCharm and VS Code really help you a lot. It helps to deeply understand the types that are being passed around and give you autocomplete and stuff like that or Flake8 type error checking. There's Mypy, which you mentioned, which was really instrumental in helping Dropbox convert like a million lines of Python 2 to Python 3, which I thought is an interesting use. But the third one that I realize looking through your stuff is it's really nice when you have just static text not in an editor. It's just sitting there on the page. I don't have a Command + Click, go to definition to see what's happening on this function, but if you put the type hints right there like oh, well, this is returning a list of ints. So, I don't need to go and try to find out where that was in the writing, I can just see that's a list of ints and go with that. That's pretty cool.

14:20 David Kopec: Yeah, absolutely. I mean, that's supposed to be the goal is that it's more readable. Now when you first see them and if you're not used to them, I think it is actually less readable and it takes some time to get used to. So, I think it's a fair criticism of those reviewers that, for some people, the type hints do actually make the code at first glance less readable especially if you're not familiar with them. We did include a short appendix, Appendix C to the book that kind of gives you a crash course in type hints, but it's really just a matter of spending time with them to really start to gain the benefits of the readability, I think.

14:53 Michael Kennedy: Yeah. It does take a little bit of getting used to. You know, two things I actually learned even though I'm big advocate and fan, I use the type annotations a lot, one of the things that I saw that you were using or couple things that were pretty cool. One, you're using from __future__ import annotations, the __future__ annotations. That's kind of a new feature coming for type hints, right? You want to tell people about that?

15:15 David Kopec: Yeah, absolutely. So, if you have, for lack of a better word, recursive type hints, so a type hint that refers to the type inside of it up to version 3.7 in this new annotations feature added under __future__, you would have to actually put it in as a string in quotes. And if you now import annotations, you can actually just put the type as its actual type name without putting in quotes, which is how the language will work in the future. But for now, we have to have that in __future__.

15:46 Michael Kennedy: Yeah, there is always this limitation where, like, it's super common to have a class that interacts with itself.

15:53 David Kopec: Exactly.

15:54 Michael Kennedy: I've got a class, maybe it has a static method, and it's going to take another one like just for comparison, like is this instance of me less than another instance of me. If you want to put that in type annotations, that was really tricky to understand what to do without this new __future__ version of stuff because the class technically wasn't finished being defined yet. So, you couldn't save the name of it within like the functions or the properties or whatever.

16:17 David Kopec: Yeah, absolutely.

16:18 Michael Kennedy: Yeah. Interesting. Another one that I thought was interesting is the ability to define a generic type. I literally had never seen that or tried that, but you have a lot of data structures that you're defining to be generic like a generic of T, like bracket, bracket.

16:32 David Kopec: Yeah, absolutely.

16:33 Michael Kennedy: That's interesting.

16:34 David Kopec: Yeah, we're trying to make all the code in the book as generic as possible. And let me use that term more broadly than just the type hint-specific version of it. We want to write algorithms that can work for many different problems. We don't want to code them so specifically that they're only solutions to the one problem at hand. We want them to be techniques that you could then just drop in the same method, the same class, and use it in a totally different program. So, in most languages, we have something called generic programming. So, in Java, you have generics. In C++, you have templates. In type hints in Python, you have the generic type, which is similar to those other two methods in those other languages. It's a way of saying, you know, I don't know exactly what type I'm going to be using in this problem. So, here's a type that can fit in for any type. That is what a generic type is. It's a I don't know yet what the type is but at runtime, we're going to figure out what the type is.

17:28 Michael Kennedy: Yeah, it's interesting. It's like the list of int or list of string type of feature that you can get from if you import it from typing, but more broadly you can make it for your own type. That's cool. Yes, you said data classes, f-strings, all the good stuff. I just want to give a shout-out to a cool sticker I saw at PyCascades recently. They had a sticker for new versions of Python and said f'yes!'. So, F yes. It was beautiful.

17:56 David Kopec: Yeah. I mean, there's so many different ways to do string interpolation in Python, and f-strings are just so beautiful compared to the older ways.

18:03 Michael Kennedy: They are, for sure. This portion of Talk Python is sponsored by Microsoft and Visual Studio Code. Visual Studio Code is a free open source and lightweight code editor that runs on Mac, Linux, and Windows with rich Python support. Download Visual Studio Code and install the Python extension to get coding with support for tools you love like Jupyter, Black formatting, Pylint, pytest, and more. And just announced this month, you can now work with remote Python codebases using the new Visual Studio Code remote extensions. Use the full powerful of Visual Studio Code when coding in containers, in Windows Subsystem for Linux, and for SSH connections. Yep, that's right. Autocompletions, debugging, the terminal, source control, your favorite extensions. Everything works just right in the remote environment. Get started with Visual Studio Code now at talkpython.fm/microsoft. One thing I wanted to ask you about is you've done this book in Python understanding these classic computer science programs and you've done the same problems, not the exact same book but basically the same problems in Swift 'cause you had a classic science problems in Swift as well. I'd love to get your thought on comparing these two experiences.

19:13 David Kopec: Absolutely. So, most people would not be surprised that the Python version was more succinct. And a lot of people would not be surprised that the Python Standard Library is a lot richer than the Swift Standard Library. And so, there were actually parts of the code. First of all, when we were doing the Python book, we wanted the book to be of course totally Pythonic. So, we put together a team of reviewers and also the technical editor that were really experienced Python programmers who were making sure that we were not just porting over the Swift book and it was still looking like Swift in Python. We really want most of the problems again from scratch. But there were situations where there was whole methods from the Swift book that we could totally eliminate in the Python book because those techniques already exist in the Standard Library. And so, while in the book, we only basically use the Python Standard Library, we don't use any external libraries, that still meant that we were able to eliminate whole sections of code that just don't exist in the Swift Standard Library. As far as the languages go, I mean, Swift has been inspired by Python in some of its syntax. So, there are actually parts of it that look surprisingly similar. But as a whole, like I mentioned, the Python code is certainly more succinct. I'd say the Python code is probably more readable as a whole. And working with the two languages, Swift is a moving target. Every version of Swift, there's sometimes some breaking syntax changes. It can be kind of annoying. It's starting to stabilize the last couple of years. Python of course is an extremely mature language with extremely mature tooling. So, that was of course a pleasure as a book writer. You don't want to be writing code that is going to be obsolete a year from now. So, that was much more fun not having to think about that when writing the Python book.

21:02 Michael Kennedy: Yeah. Yeah, I don't think it would be a major difference just remembering reading through all the solutions you have, but one of the big difference is, with Swift, it's one of the most statically typed, safest of all the languages that I've worked with at least. Not only does it have a type system, it has things like these are read-only reference variables and all sorts of stuff like that like the constant or like nullable. You have to explicitly say a type is nullable or it can never the equivalent of None and things like that. Did that make a big difference or not so much?

21:37 David Kopec: I think it does for the reader. I think because there's a lot of cognitive overhead that goes with all of those safety features. So, having to always think about should this be an optional, oh if this is an optional, do I have to unwrap it here. That's the type of stuff you have to constantly think about in Swift. And in Python, your mind is a little more free. So, it actually makes the code more readable and easier to think about when you don't have to think about all those safety features. I think Swift code takes longer to write than Python code. It might be safer at runtime, but it also takes I think less time to read Python code than Swift code as well because you're not reading through all of those hoops you have to jump through for safety.

22:15 Michael Kennedy: Yeah, that's my sense from my limited experience as well. All right. Let's dig into some of these classic problems. You start off all of your problems with this one as well, but I would say one of the most mind-bending ideas of problem-solving when you first come into properly study programming rather than just like oh here's a loop and here's a variable, oh I'll make it work, is recursion.

22:41 David Kopec: Absolutely.

22:42 Michael Kennedy: It's just like wait, how does this work. There's no loops, but we're going across a set, or something crazy like that, right?

22:48 David Kopec: Yeah. Recursion is something that does bend your mind the first time you see it. It requires a new way of thinking. The crazy thing is, there's whole programming languages that are based around recursion. I mean, if you look at some of the functional programming languages, they don't even have loops. Any time you want to repeat yourself, you have to do recursion. Recursion is not a technique you want to apply every time you program. Recursion is a technique you want to apply when it makes more sense to you because any problem that you can solve recursively, you can also solve iteratively. So, you can also solve using loops. So, they really are two different ways of just repeating yourself, but it's a question of which one makes more sense to you in your mind when you're abstracting away the problem and thinking about how you want to solve it. So, there are situations where even though a loop might actually be more performant, which it often is in a language like Python, the recursive way of approaching the problem is more similar to how you might have written it down on paper, let's say.

23:45 Michael Kennedy: Right. Or maybe it perfectly matches the data structure you're trying to interact with, right? If I have like a tree data structure or something hierarchical, that's often really easy to work with in a recursion way because the recursion step is take one of the subelements and work on it as if it were the whole thing until you're out of subelements, right?

24:04 David Kopec: Absolutely. And we sometimes call structures like that even recursive data structures. Absolutely.

24:08 Michael Kennedy: Yeah, for sure. And this one, the example was the Fibonacci sequence of which I'm also a fan for these little demos that's complicated enough but not super complicated. It's nice to work with. So, you said all right, well, let's try to solve the Fibonacci sequence problem with recursion because, well, the first two numbers are either 0 and 1, or 1 and 1 depending on who you ask, then it's you add those two together. So, if you have f(n), it's just f(n - 1) + f(n - 2). It's super easy to do recursion, but it turns out that recursion kind of goes a little bit crazy in its naive way. For example, Fibonacci at 20 using recursion does 21,891 calls. That's a lot. So, you said, well, the reason is, we keep asking the same question again and again, what is the Fibonacci at 10. And if you want 11, you ask the same subsequence a whole bunch of times, right?

25:00 David Kopec: Right.

25:01 Michael Kennedy: Then you get into memoization, which is a funky word but a very cool technique to get for performance.

25:06 David Kopec: Absolutely. I mean, yeah, the Fibonacci sequence is, for listeners who aren't familiar with it, is each of the numbers is the sum of the previous two numbers. So, it goes 0, 1, 1, then 1 + 1 is 2, then 2 + 1 is 3, then 3 + 2 is 5, et cetera. If you actually write that recursively, it looks incredibly similar to how we just defined it. We just said it's the sum of the previous two numbers. So, recursively you can actually write take the Fibonacci number of the previous one and minus one and add it to the Fibonacci number of two ago, so Fibonacci of n minus two. It's actually like two lines of code, maybe three lines of code in Python. That's, like you mentioned, not very performant because you do all of those recursive calls many, many, many times when you do a large number. And we show in the book how you kind of go through a tree of them. So, we go through six different ways of solving the problem, each of them getting a little bit more performance or just doing it in a slightly different way. One way to solve the problem is with memoization. In memoization, you actually store results that you've already computed so that if you need to compute them again, you can just look them up. That's basically the idea. And it's a fancy word for a really simple idea, which is basically just let's store results that we've already come up with, and then we can just look them up instead of having to calculate them again. So, in Python, it's super easy to do that. You can do that using a dictionary. You can just store in a dictionary. Here's a previous result for a certain set of parameters. In the case of Fibonacci sequence, it's just a single parameter, which Fibonacci number are we trying to look up. We can just store that in dictionary. There is also a built-in facility in Python called lru_cache. This is from the functools package. You can use that to automatically store the results of a function so that the next time the function is called, we just auto-look in that cache for what the last result is. So, that's a little tip from the Python Standard Library that can just make a huge performance difference for all kinds of different mathematical calculations because you're not going to have to do it again if they're just already going to be stored for you.

27:13 Michael Kennedy: Or anything that can be cached basically.

27:15 David Kopec: Right.

27:16 Michael Kennedy: So, you put the lru_cache decorator on your function, and that function could be computing Fibonacci numbers or it could be going to a database and making complicated queries, but you're like, well, if this is however old or maybe this rarely, rarely changes between runs of the app, you only need to ask the database once and then you hold on to the answer, right?

27:36 David Kopec: Yeah, absolutely. That's one of the things I love is when we're able to teach something that's actually an important computer science topic, memoization, but we're also teaching you something about the Python Standard Library that easily in your day-to-day software development, it can really save you a ton of performance cost.

27:51 Michael Kennedy: Yeah. I agree. Super nice. Final takeaway. Well, I guess I talked about there being 21,000 function calls for Fibonacci 20. If you use memoization, it's 39.

28:01 David Kopec: Right.

28:02 Michael Kennedy: Even so, directly implementing it with generators and just a straight loop, it's still much faster.

28:07 David Kopec: Right. Then it's just 20. So, it's literally just every single one, let's say we're doing Fibonacci number of 20, we just have to do basically going around the loop 20 times, one for each number in the sequence. So, that's a situation where yeah, iterative, which is loops, is actually more perfomant than recursive, and that, you'll often find, is the case. Even though recursion is a really cool technique, usually it's actually faster to do things with loops in Python.

28:32 Michael Kennedy: Right. It's more about optimizing for understanding versus performance, which maybe understanding is better at first unless it really matters.

28:40 David Kopec: Exactly.

28:41 Michael Kennedy: Yeah. So, one thing I wanted to ask you about, when I learned about the yield concepts and these generators ideas and whatnot, like that's pretty mind-blowing. Doesn't that almost be its own classic CS problem, it's these lazy deferred execution types of things?

28:59 David Kopec: Maybe it should be. We should have covered it as such in the book. We don't go into it in a huge amount of detail. We kind of assumed the reader is familiar with generators and maybe we shouldn't have. That's something I should have prefaced our whole conversation with is we wrote the book for intermediate to advanced Python programmers. So, it's really not appropriate for people who are just learning Python for the first time. We already assumed that you've been doing Python development for probably a couple of years before you pick up this book. So, we don't go into explaining every detail of generators. Now that I think about it, maybe we should have because even some intermediate and advanced programmers are not always familiar with them.

29:37 Michael Kennedy: I think it's okay, but I do think this idea of solving problems with generators, it's just so powerful and just unlocks a lot of potential for interesting pipelines and other types of execution with usually simpler code.

29:51 David Kopec: Yeah, absolutely.

29:52 Michael Kennedy: Than non-generator version. So, yeah, pretty interesting. So, another area that you focused on was data representation in a pretty simple form, not like is this a point or rich data structure or is it parse or things like that. But you focused on trying to represent DNA, right, I think is what it is. You had codon and stuff like that. Actually it was just amino acids at first. It's like, well, how do we store this. We could store it as a string. The ACTG, each one could be a string. That would be pretty large in terms of representing a data. You can store it as integers, but even integer is not very efficient in Python. So then you started working with bits and bytes, right?

30:34 David Kopec: Yeah. So, oftentimes we don't even think about how what we think are really simple bits of data are going to be stored. So, your first inkling when you're thinking, how should I store the letters A, C, T, and G to represent DNA might be why don't I just store them as strings because those are letters. So, that makes sense. Letters can be parts of strings. But actually if you only have four different things that you want to store, you can do that with just two bits of information per thing because two bits can represent up to four different numbers. And so, each of those four different numbers could represent one of those four letters. And so, if, for string, we're going to end up storing approximately eight bits for every character, so eight bits for A, eight bits for T, et cetera, and if we do it as a number and we can actually just store two bits for A, two bits for T, that's actually a 75% space savings just from thinking about the fact that oh I actually don't need that many different possibilities that a string provides for this type of data. It sounds like common sense, it is common sense, but we don't even think about that sort of thing a lot of the time. And so, we're wasting a lot...

31:48 Michael Kennedy: It's just not obvious at all.

31:49 David Kopec: Yeah, no.

31:49 Michael Kennedy: In some low level like C, you talk explicitly about the data types and very, very carefully. This is the four-byte integer, that's a two-byte integer. But even integers in Python are like 28 bytes each for the number zero. Well, yeah. But at least the number zero to 256 are flywheel preallocated. So, they're shared. But normal numbers take up a lot of space because they're really embedded within these PyObject things that are tracked by Python with reference counting and all that.

32:18 David Kopec: And we're not saying that you should always be doing this. Maybe it actually is a lot more convenient a lot of the time to just use strings. You absolutely should do that. But if you're not even aware that there was a better way or there was a more at least space-efficient way, maybe there's a situation where you're not storing kilobytes of data, but you're storing gigabytes or terabytes of data and you could have actually been saving 75% of the space. Well, wow. Maybe it's relevant then. So, again it's just something we want to make people aware of, that we're not saying you should always be finding the absolute, most efficient way to store everything.

32:50 Michael Kennedy: Right. Or you have a RESTful API that's exchanging JSON and you're like, you know, if we wrote this in straight TCP packets and we just did this bit-wise storage, we could drop the traffic by 95% or something.

33:06 David Kopec: Sure. I mean, but that sounds really painful. That doesn't sound like something. I'd rather just exchange with JSON I think most of the time.

33:13 Michael Kennedy: Yeah, I'm thinking it was like places where PayPal has an API that's called like several billion times a day and it means really, really low, like it's got to be pretty far out there.

33:25 David Kopec: Yeah, absolutely. It's got to be an extreme case before you want to start, yeah, going down that road.

33:30 Michael Kennedy: Yeah, absolutely. All right. So, another one that I thought was an interesting one was the Towers of Hanoi because it's super painful to solve by hand. You sit down. It looks real simple. You've got these three towers that each have these little disks, disks are bigger as you go down, and the job was to move them from one end to the other end. But the rule is, the big ones can't go on top of the little ones. So, how do you, what series of operations? It's kind of like the Rubik's Cube type of thing, right? But the solution turns out to be beautifully solved with recursion, right?

34:03 David Kopec: Right, yeah. I mean, so, it sounds like a really simple problem. We have three different let's call them towers and we have three disks and we want to move all the disks from one tower to another tower. How can we do that? Well, if you actually try to work it out, it's several steps and they're not completely obvious when you just look at it although it does help to be able to see a model of it. So, it's a little hard to describe on a podcast. But what's interesting is that the solution actually just comes down to figuring out two things. One is how do I move a single disk and the second one is how do I move all the rest of the disks. And that actually lends itself to a really succinct recursive solution, just a few lines of code, and then it can solve it actually for any number of disks, not just three but let's say we had 10 disks. Well, we can actually solve that by just knowing how to solve the first disk and then having some step that we do for moving all the rest of the disks. So, it's kind of mind-blowing when you actually see how succinct it is to solve it for any number of disks recursively. And I don't think we can explain it wonderfully on the podcast, but I guess people will just have to take our word for it.

35:16 Michael Kennedy: It is really, really interesting, like the last move is obvious. You just move the disk over. It's kind of mind-blowing. And I do think it's a perfect application of this idea of how recursion really nicely fits these things because I think it would be really tough to implement that with loops and just mutable data structures and whatnot.

35:36 David Kopec: Yeah, absolutely.

35:36 Michael Kennedy: So, one that I thought was really nice tying together algorithms and data structures in a way that's kind of mind-blowing to be honest was maze traversal.

35:46 David Kopec: Yeah, you know, there's so many different ways to traverse a maze. This is actually a really good practice for people who going into coding interviews because there's a lot of these techniques for solving a maze that often come up in coding interviews, things like depth first search, breadth first search. And then at the end of the chapter, we go into astar, which is actually the most efficient algorithm for finding a path through a maze. But it's remarkable how all of these techniques, when you write them out in code, they're very, very similar. And all you have to actually change between depth first search and breadth first search for example is the data structure. So, in both of those, we're keeping track of where am I going to look next. We call that the frontier. So, where are the next places on the grid? Let's say the maze is a grid. Where are the next slots on the grid that I'm going to look at next? In depth first search, you use a stack. So, you just keep going through and you keep adding the next places you're going to look onto the stack. In breadth first search, you use a queue. And so, all the rest of the code stays exactly the same. Don't change a single other thing in your code. You just change, okay, for the frontier, the next place I'm going to look at, I'm going to use a queue instead of a stack. And then the way that we traverse the maze totally changes. In depth first search, we kind of take a stab at it. We go as far as we can through. And if we hit a wall, we backtrack to the last location.

37:12 Michael Kennedy: That's probably the way real people would, right? Like if you were put, as a human, put into a maze, you would just go till you hit a wall. You would probably not apply some breadth first algorithm whenever, right?

37:23 David Kopec: Right, yeah. As human beings, that's how we usually think. I think we kind of think in a depth first split way, but a breadth first search is super systematic. It says you know what, let's first look at the nodes that are one away from where we started, then let's look at the nodes that are two away from where we started, then let's look at the nodes that are three away from where we started. And just by changing from a stack to a queue, and for those that don't know, a stack is a data structure where the last item we put in is the first item we take out, and a queue is a data structure where the first item we put in is the first item we take out. Just making that change of data structure. So, literally changing one line of code totally changes how we traverse the maze. And so, maze traversal is an awesome way of kind of illustrating the difference between depth first search and breadth first search.

38:05 Michael Kennedy: Yeah. So, you had the queue, the depth first one? No, in the breadth first one. Then you come to astar, which uses a new data structure but is also pretty similar in our implementation called a priority queue.

38:19 David Kopec: Right.

38:20 Michael Kennedy: I don't believe I've seen a priority queue.

38:21 David Kopec: Yeah. And this is a place where the Python Standard Library is pretty cool because the Python Standard Library has built-in facilities for handling heaps, and a heap is a data structure that's at the basis of a priority queue. What a priority queue does is instead of a stack where we say last thing in, take that thing out first, and a queue where we say first thing in, take that thing out first, with a priority queue, we say let's assign every item we put into the data structure a quote unquote priority. And when we take things out of the data structure, let's take them out by their priority. So, the first thing we take out might be the thing with the highest priority. So, we have a way of kind of tagging each element and saying here's your priority. And then when we pull them out, we're going to efficiently pull them out by their priority. So, in astar search, what we want to have is kind of heuristic for traversing the maze. Instead of just dumbly saying oh what are the next things next to where I am right now, let's instead say looking at my next possibilities, which one do I think will get me closest to the goal. And the way we usually do that on a maze is actually just using a straight line, sometimes called Euclidean distance. Just draw a line from the next places I'm looking at to the goal. And the shortest line, that's probably the one that's getting me closest to where I want to go. And so, we can then put all of those next nodes into the priority queue ordered by how long is that line and how long that line is is what the priority is and we just take the ones out that have the shortest lines next. And that gets us to an algorithm that will actually get us every efficiently from the start to the goal.

39:59 Michael Kennedy: This portion of Talk Python to Me is brought to you by Microsoft. For ultimate developer productivity in the cloud, use Azure Extensions for Visual Studio Code. You can deploy and debug your serverless Python apps directly from your editor. On Azure, you can run your Python apps as serverless code on Linux web apps and functions or on top of managed Kubernetes and easily connect data from database services including Postgres and MySQL. You can also use Azure DevOps to create cross-platform builds of your Python packages with hosted MacOS, Linux, and Windows build machines and publish them to Azure Artifacts for your own hosted PyPI feed. Azure DevOps is free for opens source projects and many are using it already. Get started for free at talkpython.fm/microsoft. You also mentioned this idea of the Manhattan distant metric where you can't actually walk diagonal on a maze where it's a grid or something like that, right? And you also can't do that in cities where they're laid out on grids, right? There is no diagonal travel. It's over one, back one. So, like the diagonal is effectively as long as going around the border on these things. So, you can also even get a little more accurate, that right?

41:07 David Kopec: Right. So, yeah, the straight line is always the shortest way to get from point A to point B, but straight lines don't always work. So, if we're on a grid where we can only go horizontally and vertically from one spot to another, we actually can use a different heuristic known as the Manhattan distance, which is named after the borough of Manhattan of New York City where all the streets are on a grid pattern. And so, when you want to walk, you have to walk a certain number of blocks to the right or left and a certain number of block up or down. And if you actually do that calculation on a maze that can only go horizontally and vertically, it's going to give you a better estimation of how long it'll take to get from some location to another location than the straight line will because it's closer to how you can actually move.

41:54 Michael Kennedy: Yeah. Yeah. What I thought was really interesting about the coming together of the three ways of solving this maze problem were you have basically the same algorithm and it's really about the data structure that makes it all work. That was pretty cool.

42:10 David Kopec: Yeah. A lot of people have said that's one of their biggest aha moments throughout the book is where we can go and just change the data structure and the solution, how it actually works changes even though the rest of the code doesn't.

42:21 Michael Kennedy: Yeah, it's super cool. Another area that was pretty interesting was the whole set of constraint problems. So, you have a couple in here that we could touch on, Australian map coloring, the SEND plus MORE equals MONEY, maybe some more realistic problems after that.

42:36 David Kopec: Yeah. What's cool about these problems is how we can think about them altogether as really one problem-solving technique. So, let me tell you a little bit about the problem. So, the Australian map coloring problem is one where we take all the regions of Australia and we want to fill each of them in with a color and we only want to use three colors, let's say red, green, and blue. So, is it possible to find a color that we can put on each region without any of them next to each other having the same color. And the answer is of course yes. We see this on maps all the time, right? So, how can we solve this with a computer? Well, one way of thinking about the problem is as a constraint satisfaction problem. And what a constraint satisfaction problem is, it has three parts to it. It has variables. In the case of this problem, it would be the regions. So, the regions of Australia are variables. A constraint satisfaction problem always has domains. Those are what can we fill in for each of the variables. In this case, that would be red, green, or blue. That's the domain of each of those variables. Each of them has to be one of those values for our solution. And then the last part of constraint satisfaction problem is the constraints. In the case of this problem, the constraints is that any two regions next to each other can't have the same color. So, we actually have a bunch of constraints. We have one for every single set of regions that are next to each other. So, a constraint satisfaction problem-solver can solve this problem, but it can also solve any other kind of problem that we can define using variables, domains, and constraints. And so, we can write a constraint satisfaction problem-solver just once and then we can use it for other kinds of problems besides just the Australian map coloring problem such as the one that you mentioned, SEND plus MORE equals MONEY. That's a cryptarithmetic problem. It says...

44:17 Michael Kennedy: Those seem so different. And yet the same problem technique or solution will fix, hit 'em both. So, yeah, tell us about that one. It's really interesting.

44:26 David Kopec: Yeah, it's totally different. SEND plus MORE equals MONEY is what we call a cryptarithmetic problem. If you line up the letters S-E-N-D and the letters M-O-R-E, and you said each of these letters is going to represent a digit, so maybe E is four and maybe N is five or something like that, and then underneath them, as if it was arithmetic problem, we put the letters M-O-N-E-Y, money, the word money, and we also say that each of those letters has to be a corresponding digit. What digits can we fill in for each of those letters that will lead to SEND plus MORE equalling MONEY? Well, if you think about it, it actually is a constraint satisfaction problem again. Here's why, each of the letters is our variable. So, S might be a variable, E might be a variable, N might be a variable. Each of the possible values of those letters is our domain, so the digits, zero through nine are the domain of each of those letters. And here's our constraints, our constraints is that for whatever digits I fill in for each of those letters, the combination of SEND plus MORE must equal MONEY. So, again it's actually a constraint satisfaction problem. But when you first look at it, you might be like wow, this seems totally different from the map coloring problem, but actually they're the same kind of problem.

45:36 Michael Kennedy: Yeah. That's pretty amazing. And once you lay it out like that, it's obvious. But I would never look at those two things to go yeah, that's the same problem.

45:43 David Kopec: Right, absolutely.

45:44 Michael Kennedy: That's one of the things I like about studying these classic problems is they show you this larger symmetry or commonality across what seem like different problems and like oh it's a recursion problem, it's one of these constraint problems, or it's a graph problem, or something like that.

46:03 David Kopec: Yeah. It's amazing how many different ways we can apply the same problem-solving techniques.

46:07 Michael Kennedy: So, when you're talking about these techniques like these constraint algorithms are not particularly new but you did talk about one being coming along that is actually new, something called constraint propagation? What's the story there?

46:21 David Kopec: Well, I won't say it's new, but it's a more advanced way of solving these constraint satisfaction problems. So, in the chapter, we teach you a way called backtracking search. And it's actually a type of depth first search. So, it kind of builds on, this is chapter three of the book, it kind of builds on chapter two's explanation of depth first search. And backtracking search, it's every time you hit a wall, so every time we found a way of filling in domains for these variables that solves the constraints but then it doesn't solve one of them, we go backwards and we try to find a different fill-in that doesn't lead us into that wall. In constraint propagation, we instead ahead of time try to figure out okay, I think this is going to work well and we might still do a backtracking search at the end to make sure it really does work well, but instead of doing it after we've already hit a wall, we try to figure out what's going to work before we hit the wall at the very beginning. We don't actually get into the details of how that works in the book. It's kind of more of advanced technique, but it's a little more efficient. And so, if you are going to solve this on a really big problem, so a problem that let's say had hundreds of variables and hundreds of domains and constraints, then maybe you would want to use something more advanced than the technique we teach in the book.

47:33 Michael Kennedy: Sure. What are some of the real-world places where these constraint problems show up?

47:37 David Kopec: I would say the most common is probably scheduling. So, if you think about a bunch of people want to meet and they have, all of them have different availabilities, we could think about the individual people that want to meet as the variables, what their availabilities are as their domains, and who needs to be at that meeting as the constraints. There's actually real-life schedulers that use constraint satisfaction techniques to do scheduling. I can't tell you for sure if that's what Google Calender uses, for example, but it wouldn't surprise me.

48:07 Michael Kennedy: Yeah, of course. So very powerful. And I think the idea of the common technique across non-obvious things is really great with that set of problems. Another one are graph problems, not parabola but vertices and edges types of graph theory problems, right?

48:24 David Kopec: Yeah. And I think this is an area that people who have a math degree or a CS degree are totally familiar with and sometimes people who are self-taught programmers don't have any experience with. Again it's something that is applicable to actually a wide range of different problems and you might not even realize it when you're first seeing it because what is a graph? Well, a graph is technically a set of vertices that are connected to each other by edges and it's really an abstract math concept. But in the real-world, we actually see them all the time. So, any time you see a network, that's basically a graph. For example, the subway in New York City, that's a graph. Every one of the stations is a vertice and edges are the tunnels that connect each of those stations. If you want to lay out electric wire throughout a neighborhood, that's a graph. Each of the houses you need to hook up, that's a vertice. And the line you're going to run from one house to another to the poles, that's an edge. So, we can actually use this concept of just having vertices and edges to represent a huge number of different problems. And what we teach in the chapter is a bunch of different techniques for solving common problems you'd fall into when you want to use these graphs such as how do I find the shortest path from one vertice to another, or to put it in the context of what we just mentioned, how do I find the shortest path from one subway station to another subway station, or how do I use the minimum amount of edges of certain lengths to connect all the different vertices. In the context of the electric wire, how do I use as little electric wire as possible to electrify this whole neighborhood. And that's actually the problem that originally inspired some of these algorithms. In Czech Republic, they were trying to electrify large parts of Czech Republic and some mathematicians in Czech Republic in the 1920s and 1930s came up with the algorithms for finding the least amount of electric wire to electrify the country.

50:19 Michael Kennedy: That's a pretty cool bit of history. I didn't know that. Do you like view in there? When you're presenting the problem, you talk about the Hyperloop and some of the interesting things you might consider. So, you have a bunch of connections between the major cities and you imagine those are Hyperloop, you know, Elon Musk, high-speed tube things connecting them and you said look, here's the map of the US and here's the cities, maybe the tracks, but throw away the map, just think of these as connections and these points and then how you get from one point to the other. You also said some interesting things like probably it's longer to make a connection at a city than it is to travel between two cities given that those things are supposed to travel supersonic effectively.

51:01 David Kopec: Right.

51:02 Michael Kennedy: So, that completely changes the way you think about solving the problem, right?

51:05 David Kopec: Right. I mean, the Hyperloop is supposed to be, for people that don't know, the super fast way of going 700 miles per hour in some tubes underground, right? If you're going 700 miles per hour, what do you care about more, how long the routes are? Do you care like that it's 1,000 miles from one to another? Or do you care how many stops you're going to have to make along the way because it actually might be every time you make a stop, there's a little layover and that will actually end up taking more time even if the other route is a longer distance. So, we use the Hyperloop as a motivating problem to go through a bunch of these different graph algorithms. And one thing we find is that the distance that it takes might not matter as much as how many stops we make along the way for something like the Hyperloop. And so, there's a different algorithm you'd use for how many stops you want to make versus the route that might have the shortest distance.

51:56 Michael Kennedy: Right. You talk about things like minimum spanning trees where like maybe what is the minimum amount of money or track or tube you can lay to actually just connect these things regardless of what is efficient, right?

52:08 David Kopec: Right. So, if we think about every major city in the United States and we want to connect them all, how do you actually draw connections between all of them that would use the minimum amount of money to build let's say because it's going to be the minimum amount of track that you would have to lay. You know, as human being, we can solve that pretty easily if there's like five cities, right? We may even be able to solve that if there's 10 cities. When you start thinking about 15 or 20 or 30, you need an efficient way for a computer to do that. And finding the minimum spanning tree is what we call those algorithms.

52:42 Michael Kennedy: Yeah. It's pretty cool. Another one, another category of problems that you talked about and they seem to have a lot to do with games like board games and stuff were the adversarial search problems. So, these apply to zero-sum, two-player, perfect information games unlike, say, poker or something where you don't know what the other player has or can play, right?

53:02 David Kopec: Right. So, a very common thing that people do when they're learning programming is programming Tic-Tac-Toe. For some reason, it's just something people love to do. Super easy to program Tic-Tac-Toe when you're just saying okay, player one is going to put X somewhere on the board and player two is going to put O somewhere on the board. That's easy. But what if you want the computer to play Tic-Tac-Toe perfectly? So, your first instinct might be oh so I'll just hard-code that every time there's two things next to each other, have the computer do a block, or every time the board is totally open, have the computer play center. But there's actually an algorithm called minimax that instead of having to hard-code every possible position, we can just have a way of figuring out what the best position is all the time by searching through what the result will be of the opponent's move to the computer's move and then the result of the best move I can make to the opponent's move back and forth, back and forth till I get to the end of the game, or sometimes in minimax like in a much more complex game than Tic-Tac-Toe, something like chess or checkers. We won't get to the end of the game, but we'll get let's say five or six moves into the future and we can then just see how good is the position at that point. So, minimax is an algorithm that we can use for any game that's basically two player that we can look at the whole board at once, that's what we call perfect information, being able to see all the information about the game at any given time, and have a computer play it really, really well. This is the way that chess engines work, for example. Most of them use minimax or more advanced variants of it that allow them to play better than a grandmaster in the world. And on a modern computer, you can program yourself, a chess engine that'll beat any of your friends easily. It's not that hard.

54:51 Michael Kennedy: Yeah, you get a little IoT and a camera and some computer vision. They just take it to the tournament, right?

54:57 David Kopec: Yeah, exactly, yeah.

54:59 Michael Kennedy: Yeah. It's pretty cool. That's a nice one. Now I save the craziest one to ask you about for last because I looked through the examples and it's still kind of sketchy in my mind, but it's very interesting. That's the whole category of genetic algorithms where you don't really know the solution, but you turn to lose a bunch of different algorithms and then they tweak themselves and they evolve to try to find the solution. Tell us about that.

55:24 David Kopec: Yeah. So, with genetic algorithms, we don't usually know the way to solve the problem. So, it might be a problem that's so hard that we don't know any way to efficiently come up with a solution. And so, instead we borrow this idea from nature of trying a lot of different solutions and slightly modifying them when we think they're getting closer to the actual solution. In the real world, we have evolution and natural selection. You have some species comes up and it's really well-adapted to its environment. So, it has more children. And then some other modification to it might actually make it less adapted. So, it has less children. And the species that's better adapted is the one that tends to survive over time, right? So, it's just applying that same idea to solutions to a problem on a computer. We're saying oh I'm going to maybe at first even randomly try a bunch of random different solutions. The ones that seem to work better, oh I'll let those solutions propagate, I'll modify them slightly. So, it's like they're having children. And I'll let those children be the progenitors of the next generation of possible solutions. And the ones that didn't seem to work so well, well, I'm going to let them die. Unfortunately that's how natural selection works. And so, I'll keep hopefully slowly building by making small random modifications closer to an actual solution to the problem. Now doing anything randomly is never probably going to be your first reach for solution to a problem. When you do something randomly, there's a strong possibility it'll take a really long time to really get to a good solution even if you're climbing in the right direction as we do in genetic algorithms. So, you only want to use genetic algorithms when you really can't think of any other way to approach this problem. But there are some situations where they work pretty well because we don't know any better way to really solve the problem.

57:15 Michael Kennedy: Sure. So, I understand the idea of letting it evolve, like the algorithm evolve. You tweak it in a little way. But it seems to me like you almost have to give it, the algorithm a fixed structure, right? You're going to try to do this with these seven almost constants or constraints that you're going to apply, but you're always going to try to fit this type of calculation or this type of thing to it. Like how much variation can you actually get as these evolve?

57:45 David Kopec: You're absolutely right. You do have to set some constraints at the front. For example, if I'm trying to find the solution to a math problem and I have X and Y, and X and Y, I need to know what are the right numbers that should fill in for X and Y to solve the math problem. It would be really helpful if instead of generating any random numbers at the very start, I put them within a reasonable range, like if I know just before I start the problem that probably X and Y are going to be between zero and 100, it'd be good if I just generated random X and Ys to start out with that are only between zero and 100 because otherwise I can just go way off in the left field and never get anywhere close to my solutions. You're absolutely right. You have to have some knowledge of the problem to set it up appropriately for genetic algorithm.

58:31 Michael Kennedy: Right. Is it possible? I guess let me just ask you that way that I start out with an algorithm that says we're going to try to use this equation to match some kind of answer and it evolves itself into using a neural network instead.

58:45 David Kopec: Oh.

58:46 Michael Kennedy: It can't make that much of a change, can it?

58:47 David Kopec: No, it can't make. You know, it's interesting you mentioned that. We do have a chapter in the book on neural networks. Neural networks are the hardest technique we go over in the book. If you want to do something even harder, I have a colleague at work and he works on using genetic algorithms with neural networks.

59:03 Michael Kennedy: Okay. That's pretty interesting.

59:05 David Kopec: Yeah, trying to evolve the neural networks to more accurately solve a certain problem. So, yes, it is possible to combine the two techniques, but your listeners should not worry that if you just program in genetic algorithm, it will not morph itself into a neural network.

59:20 Michael Kennedy: And maybe a singularity but probably not...

59:22 David Kopec: Not yet.

59:23 Michael Kennedy: Not for that path. Not that one.

59:25 David Kopec: No.

59:26 Michael Kennedy: It's cool. So, I guess that covers many of the problems you laid out there, some other popular ones like the knapsack and traveling salesman problems and some others we didn't cover as well. But there's a lot of cool techniques that I think are worth studying here. Which one is your favorite?

59:40 David Kopec: I would say that I actually like the graph problems best because we're doing something there that is motivated by a real-world problem, which is building out a transportation network throughout the whole chapter. One thing that people might be a little suspicious of is that the book is full of toy problems, and that is true. A lot of the problems in the book are problems that are a little fanciful. They're not necessarily something that you're going to come across on a day-to-day basis because we have to have problems small enough that you can actually learn all of them in a fairly short chapter. We can't get into all the details of a really sophisticated problem unfortunately in a book of this length and this broad. So, a lot of them are kind of toy problems. It's more about learning the techniques that you use on these toy problems and how they might apply to your real-world problems. In the graph problems chapter, I think right away you can see how that problem is similar to a lot of real-world problems. All the time, we're trying to figure out what's the most efficient way to traverse some kind of network. If you're working in even API plumbing, you might have a bunch of nodes that you need to hit. And what's the most efficient way to traverse hitting all those nodes in a certain order? That lends itself maybe to a graph problem. So, the graph problem chapter, I love really just because it's so immediately applicable.

01:00:59 Michael Kennedy: Yeah, it's a good one. So, one of the things that was evident to me was that choosing the right data structure was really important and learning the algorithms is good, but I think also knowing the data structures. Maybe it's even more important in some ways. So, for people who maybe are self-taught, what are some of the most important data structures and where do you think they might have holes that they don't realize are blind spots that they should go check out?

01:01:25 David Kopec: Well, if you're programming in Python, you got to be really, really familiar with lists and with dictionaries. I mean, if you're not at least an expert in those two, you're not using Python to its full extent because they are just so all over the language and all over the Standard Library. So, you should, at a minimum, be familiar with lists and dictionaries. Then there's more abstract data structures that are built on top of those. Everyone should be familiar with stacks and queues, which are built on top of lists. And again those are, like we mentioned earlier, structures that the first thing I put into them might be the first thing I get out in the case of a queue or the last thing I put into them is the first thing I take out in the case of a stack. Everyone should be familiar with those two. Then we get into some that are a little bit more esoteric like a priority queue. They're really useful to know about in certain applications or certain algorithms that are pretty useful algorithms like astar, which we talked about earlier, that you can't implement without them. So, you might not use them every day, but there are situations where it's good to be alternate familiar with them. We don't do a lot of work in the book with trees, which some people might find surprising, but I say that's a common hole that people have is not being familiar with how to build a tree and how to structure a tree. And one thing people use this book for sometimes I've heard and the Swift book is, well, it's preparing for interviews. Unfortunately, I think unfortunately a lot of programming interviews have become very algorithmic problem-solving. I think knowing all these data structures will help you a lot when doing those interviews. And so, I'd recommend everyone be familiar with stacks, queues, priority queues, trees, of course lists and dictionaries. Everyone who programs in Python should be an expert on dictionaries. Do you agree?

01:03:07 Michael Kennedy: I do. I definitely agree. Dictionaries especially show up not just where you use them, but they also show up as foundational things like holding all the fields and attributes of a class, for example. So, it's super, super important. Yeah. So, I guess one other question along those lines is, people are excited about this topic. Maybe they get your book. Where else could they go? Are there good books like Carnegie Mellon or something like that? Where should they go?

01:03:34 David Kopec: I'll point out a few different resources. First of all, I could say this book is not a textbook. So, it doesn't teach you a lot of the math behind these algorithms, it doesn't teach you Big O notation. It uses Big O notation a little bit, but it doesn't really go into a lot about it. So, we really wanted to make it a really approachable book. So, we kept the math to a minimum. We even tried to keep the math to a minimum in chapters like neural networks chapter and the k-means clustering chapter, these more kind of AI chapters, machine learning chapters because we wanted the book to be approachable. That said, that means that we're not really giving you the full story. And so, if you really want to get the full story, there's a couple textbooks I'd recommend. One is Algorithms by Sedgewick and Wayne. That's a really common college textbook on algorithms and data structures. It is probably the most approachable, in my opinion, of the textbooks. It's really well-written. Unfortunately it's in Java, not in Python. So, that kind of sucks. But it's still a really easy, readable textbook. The classic textbook in this area is called Introduction to Algorithms by Cormen and his coauthors, CLRS it's sometimes called. That is a really good textbook. It's more reference quality material. Some people say it's a little bit less approachable than just the Algorithms book by Sedgewick. In terms of online resources, strongly would recommend the Algorithms course on Khan Academy. Again unfortunately it's in JavaScript, not in Python, but having looked at a bunch of these resources, I really think, for new learners, that's a really strong course on Khan Academy. If you want to get more into the machine learning type of stuff we talked about in the book like k-means clustering and neural networks, then the Machine Learning course on Coursera taught by Andrew Ng has become kind of a standard that a lot of people go through. It's very high quality. Unfortunately again it's in MATLAB/Octave instead of Python. There are couple Python textbooks in data structures and algorithms. None of them have become let's say classics. So, I can't individually recommend any of them. I'm also not immediately familiar with all of them, but I do think most of these resources, for any who's been doing a year or two of Python, you'd be quickly able to read most of the ones that we mentioned.

01:05:47 Michael Kennedy: Right, okay. Well, that sounds great. Well, Dave, I think we're getting pretty low on our time. So, we've used it all up talking about all these great things. So, I guess we'll have to leave it here. But let me ask you the two questions before we call it a show. If you're going to write some Python code, what editor do you use?

01:06:03 David Kopec: I'm all about PyCharm now. Big fan.

01:06:05 Michael Kennedy: Right on. Me too. And then notable PyPI package even though ironically you didn't really use many of them in your, in your examples. What's maybe one folks haven't heard of that they should know about?

01:06:15 David Kopec: People are familiar with it, I'm sure, scikit-learn. I think scikit-learn just has so much functionality built in. And let me just add, the purpose of reading a book like this is not that you're going to go implement all these techniques yourself, it's so that you're going to know which open source library to include in your project. We don't expect you to go implement scikit-learn yourself. But when you use scikit-learn, which of the algorithms from it should you be using.

01:06:39 Michael Kennedy: Yeah, makes a lot of sense. Yeah, you don't want to go and reinvent 10 years of lots of work, right? It's like reinventing NumPy. Wouldn't do it but understanding what's happening is really good.

01:06:49 David Kopec: Exactly.

01:06:50 Michael Kennedy: Yeah, cool. All right. Well, final call to action, people, want to get more into learning these algorithms and data structures and these classic problems, what do they do?

01:06:57 David Kopec: Well, you can get the book from Manning. Manning is a publisher, manning.com. And I think you're going to be offering a promo code for your listeners. It's called Classic Computer Science Problems in Python. It's also going to be available on Amazon starting at the beginning of April.

01:07:09 Michael Kennedy: Okay, sounds great. Well, Dave, thank you for being on the show. It was great to talk to you about all these things. There is a bunch of cool stuff you put together here.

01:07:16 David Kopec: Thanks a lot for having me, Michael. It's a real pleasure.

01:07:18 Michael Kennedy: You bet. Bye. This has been another episode of Talk Python to Me. Our guest in this episode was David Kopec and has been brought to you by Microsoft. If you're a Python developer, Microsoft has you covered. From VS Code and their modern editor plugins to Azure Pipelines for continuous integration and serverless Python functions on Azure, check them out at talkpython.fm/microsoft. Want to level up your Python? If you're just getting started, try my Python Jumpstart by Building 10 Apps course. Or if you're looking for something more advanced, check out our new Async course that digs into all the different types of async programming you can do in Python. And of course, if you're interested in more than one of these, be sure to check out our everything bundle. It's like a subscription that never expires. Be sure to subscribe to the show. Open your favorite pycatcher and search for Python. We should be right at the top. You could also find the iTunes feed at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon