Monitor performance issues & errors in your code

#82: Grokking Algorithms in Python Transcript

Recorded on Monday, Oct 24, 2016.

00:00 Have them underpin almost everything we do in programming and problem solving in general. Yet many of us have partial or incomplete knowledge of the most important and common ones. And in this episode, you'll meet audit Bhargava, the author of the light and playful grokking algorithms, An Illustrated Guide Book. If you struggle to understand and learn the key algorithms, this episode is for you. This is talk Python to me, Episode 82, recorded October 24 2016.

00:29 developer,

00:31 developer, in many senses of the word because I make these applications use these verbs to make this music constructed. Just like when I'm coding another software design, in both cases, it's about design patterns, anyone can get the job done. It's the execution that matters. I have many interests in conflict.

00:53 Welcome to talk Python to me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy, follow me on Twitter, where I'm at m Kennedy. Keep up with the show and listen to past episodes at talk python.fm and follow the show on Twitter via at talk Python. This episode is brought to you by Capital One and Intel. Thank them both for sponsoring the show by checking out what they're offering during their segments. But Welcome to talk Python.

01:22 Thank you.

01:22 I'm really enjoying your book that I started reading a little while ago grokking algorithms with Python and I think we're gonna have a great time talking about it. But before we do, tell us your story. How'd you get into programming in Python?

01:34 Well, I think I have a slightly atypical road, because I got into programming with this thing called kingmaker. I don't know if anyone remembers this. It was, you know, back in the day, the blue, the blue, he had put it out like the World Wrestling Entertainment, guys. Yeah, and it was just a simple way to like, make little games or you know, make a website. And I started playing around with that. And then I thought, you know, I would love to make my own games. And I started playing with flash. So you know how old the story is?

02:12 Yep, flash was probably new and exciting at the time.

02:15 It was and like, it wasn't even owned by Adobe, it was on Macromedia. And Macromedia director was the big thing. Anyway, so I like started, you know, started with ActionScript. And I sold a game. And that's not I thought, you know,

02:31 I could do programming. This is really fun. Yeah, that's cool. So that's a ways away from Python. How do you get over to the Python world? And

02:39 eventually, I was like, because I had learned c, C++, kind of the standard route that a lot of people take. And I just wanted to get back to those flash days when you know, I mean, to me, I, I would rather work with a simple language that can execute my ideas. And that's when I found Python, and it was so easy to use. You know, it's like, you can think about something and type it out as you're thinking about it.

03:08 Yeah, that's cool. It's like programming is fun again, right?

03:12 Yeah,

03:13 exactly. reminds me that XKCD Python. All right. Well, that's, that's excellent. Did you go to university and do computer science degree?

03:21 I actually did my undergrad and graphic design. Okay. Because, you know, I mean, I've loved trying, it's been a hobby for me for a long time. And then I, I worked as a designer for a while, but I realized that I would enjoy programming more as a job. So then I went to Chicago and got a master's in computer science.

03:45 Okay, cool. Yeah, I guess that makes a lot of sense. Like the illustrated part of your book, which we're going to get to, which is, is very cool. You know, there's a lot of things you can learn. You can just self teach yourself about programming. You know, there's a lot of good boot camps. There's a lot of good online classes, things like that. But I do feel one of the shortcomings of that type of education, really is around the proper use of data structures and algorithms, and not necessarily being able to write them but knowing sort of the trade offs, right?

04:17 Yeah, exactly. You decided

04:18 to solve that problem. And you took your graphic arts skills and applied them, not not just your programming skills, but your graphic art skills and apply them to this problem with this book you wrote called grokking algorithms and Illustrated Guide for programmers and other curious people.

04:35 Yeah.

04:36 So I read your book, a good portion of it. I find it delightful. I think it is really, really nice. You know, there's a lot of these concepts that if I remember reading algorithm type books, and it's just, you know, you're about to fall asleep. It's just so dense and so hard to make a real right. It's like, I don't know, it's just tough to to grok them, I guess, right. And, and you're Book is filled with playful examples in fun pictures and things like that. Right? What was inspiration for that?

05:07 This whole thing started because I wrote a blog post on Haskell. And it's about, it's a blog post about monads, which is a tough concept for a lot of people to understand the postcard really popular. And so you know, to this day, I see it being tweeted, that's when, you know, I was like, you know, using pictures can really make a big difference when you're trying to explain hard concepts. So that's where I started, the book was like, I would love to do, because I've always known I would love to do a book on algorithms, like you said, there's all the ones that I've read so far are so hard to read. And I had a very good algorithms professor at Chicago. And she made things so easy. So you know, that's my thought it would be great to have a book that makes it as easy as she made it for me. And it just, you know, just seemed like a good set.

06:04 you started your blog@ati.io. And I'll be sure to link to that. Right and started doing these visual explanation. posts. Yeah, dude, like, so this book is published by Manning, is that right? Yes. Yeah. So did Manning reach out to you and go, Hey, this illustrated idea is really cool. Or did you? Did you approach him and say, I'd like to do something more, more towards myself.

06:26 They, when they reached out to me, it was just the craziest. After that, I kind of joked, like, this is how I get success. I just sit back and wait for people to come to me.

06:38 There's a lesson there, though, I think it's really cool to hear, I think there's a really important lesson there that you don't know what opportunities are out there, or how they're going to come to you. But if you don't ever put yourself out there and you don't try to do things in public, they're absolutely not going to come right. Like there's so many opportunities have come to me from because of the podcast, but I never even imagined that they would, I wasn't looking for them. And I didn't do the podcast for them or anything like that. But just like you said, being out there makes a big difference.

07:08 I agree. And I put in so much work into those blog posts, like it used to take me weeks to write a single post. And it's kind of like just throwing a note in the bottle, right? Like, you don't know if anyone's even reading it or even cares. But somehow, you know, I mean, I enjoyed it. So that's why I kept doing it. And things just work out.

07:31 So your book on algorithms is written in Python. And you could choose in many languages, Java, JavaScript, and so on. Why Python? It's just

07:39 the easiest language. For me. It's the easiest to learn. I used to teach a course on Python. And it was just people picked it up so quickly.

07:48 Yeah, I totally agree with that. You know, I was thinking earlier, as you're talking about, you want to get back to the ActionScript days and things are easy. A lot of people know JavaScript, but I feel like JavaScript has, by way of its popularity become hard again, you know, like all the Node JS dependency, all that all that layers piled on top of it. And somehow Python has managed to avoid that fate, which keeps it simple, but powerful. I think it's great.

08:15 I'm constantly confused by JavaScript. I think it's a I love JavaScript, because it has a lot of those functional parts that I really like. But yeah, the JavaScript community is moving so fast. It's hard to keep up. Yeah, for sure.

08:30 So one thing that you said, when you were talking, in the previous of your book, you said that for everything that you describe you like to lead with examples?

08:40 Yes. Well, so this is part of my, I mean, I have this, you know, some ideas about like, what it means to teach something well, and I think one big piece of that is you have to make things concrete. I'm reading this book on probability right now. And one thing that really frustrates me about this book as they'll start with a simple problem, so they will say, you know, for example, one of the problems in the book is this classic random walk idea. So you have a man standing at the edge of a cliff, and there's a one third chance that he will step towards the cliff, and there's a two thirds chance that he will step away from the cliff, and he's drunk, so he doesn't know which direction is going to step in. So of course, if he takes a step towards the cliff, he's just going to fall off, but there's a two thirds chance he will step away. And, you know, again, there's the one through 10 steps towards to Thursday will step away. So the question is, does he escape the cliff, or does he eventually fall off? So a pretty good probability problem, you know, because it's simple to state and pretty hard to solve. But then I read their answer in the back and they're like, okay, let's generalize this problem. And let's say, x is the probability of hands Stopping tours and buyers, the probability of stepping away. I was like, No, you have concrete numbers isn't so much better to be able to visualize the problem as you're stepping through it. And the only way to do that is by having a concrete example.

10:13 Yeah, yeah, I agree. I feel like a lot of things, the concreteness of it. And sort of the reason the whole whole thing came to be interesting in the first place, gets the kind of get sterilized, and all that stuff gets lost, and you're down to just the essence of the abstract problem. But then the joy of the problem is kind of gone in some ways, right? So I really like that you have examples. And these are like really playful examples. I think it's important to start with making things concrete, and then you can get into the theory and the abstract and so on.

10:45 Yeah, I think I also try to choose pretty real world examples. So people can, you know, look at the problem and think about all the different ways they can use it in their, in their work.

10:57 Yeah, I really like the one about Maggie and the checkout counter, we'll get to that one. Thank you. So I have a pretty diverse listenership and people, a lot of people have, you know, PhDs in computer science, but a lot of people come from different areas. So true quickly, what tell us what's an algorithm for those of us who

11:16 don't know, an algorithm is a set of instructions for accomplishing a task. So really, any piece of code could be an algorithm, right? But there are certain pieces that are more interesting, maybe because to solve a hard problem really well, or maybe they can be used at a lot of different places. So that's really what we talk about when we say algorithms, you know,

11:40 yeah, okay. I totally agree. I think it's reusability is definitely an interesting part. Like, the way you might log something to disk might be an algorithm, but it's not reusable in any way, because it's exactly specific to a thing. So you wouldn't call it an algorithm per se, right?

11:53 Yes.

11:54 Cool. All right. So in your book, you broke it down to like you said, there's going to be two major things that you've learned. One, you said, you're gonna learn about performance and the other, you're gonna learn about solving problems. How does that work?

12:07 This solving problems one, is the really big one that I focus on back when I was starting out, I just used to run into walls, sometimes where you like, come across a problem, and you just don't know how you're gonna solve it. Because you've never solved a problem like that before. You don't really have a lot of tools in your tool belt. So you can't say like, oh, maybe I'll use a hash table for this and see if that works. Or maybe I should try to use some kind of machine learning approach. If you, you know, there's this idea of, you don't know what you don't know. And if you don't know that those tools exist, you're just stuck, you can solve those problems. So this book is really about giving you all those tools like that should be able to solve most of the problems you encounter in day to day.

12:58 Yeah, that's, that's really cool. And some of the examples you give or like, if you were making a video game, you could learn how to create an AI system that can find its way through pass or a recommendation system, kind of like Amazon says you might like or even things that knowing that there are problems that are basically not solvable in a quick way.

13:19 Yeah, those NP complete problems. Yeah,

13:20 yeah. That's a whole whole different interesting. I think performance is also interesting. I mean, you talked about hash tables, and just a minute ago. And, and dictionaries like, knowing that, if I have this type of thing I need to do it's 100 times faster, using this data structure than that data structure and being able to express that it helps you design how you solve the problem, if you like, you know, if I can coerce it into the shape, it will be rocket fast.

13:50 Absolutely. That's something to me, that is really about the craft of engineering. One reason I love Python, as you can write beautiful code, but and I think thinking about performance is just another aspect of that craft where you like, you know, you want to write something that runs well, that's dependable. And that means it should run as fast as it can run. Yeah.

14:18 Yeah, up to the point where it become like you could you can optimize something so much that it becomes right online, like you wrote it, and it works, but you can't understand it, and nobody else can certainly understand it, right? So there has to be this trade off of maintainability. But yes, knowing the data structures definitely is the right in algorithms and how that works in terms of performance is is key. So the algorithms, your algorithm book is not a comprehensive, it's not like an encyclopedia of all known algorithms or anything like that, right. It's, it's more a special list. He said, the topics that you cover are all things you've used at work, which is pretty Cool,

15:00 exactly. I think if you're looking for an encyclopedia, like you could just read the Wikipedia articles, or like, one of those algorithms books that are like 1000 pages long, you know, I didn't want to, I didn't want my book to be like, we printed out Wikipedia and bounded mental book format. Yeah,

15:18 thank you for doing that. So we're going to talk in depth about some of the algorithms and stuff, but can you just give us like a quick flyover of what's covered and things like that?

15:30 Sure. So the first part of the book is like foundational work that I'll be using in the rest of the book. So chapter one introduces you to a very basic algorithm, and then talks about big O notation, which is something a lot of people get very confused by, but it's an important concept. And then chapter is referred to as about memory. Chapter Three is about recursion. So you can see I'm kind of building the foundation, like the very basics. And then we get into the interesting stuff. So like chapter four, this is where I introduce one of those tools that I was talking about, called divide and conquer. There, you know, you get a problem. I don't know how to solve this. Well, you could think like, maybe I'll use divide and conquer and see if that works. Chapter Five as hash tables, which is the most useful tool in my tool belt. I totally agree with you. Yeah, you

16:27 had a good quote is when I want to solve a problem, there are two planes of attack, I start with, like, can I make this hash table problem? How can I solve it with a graph? Maybe? One of those two, right, exactly.

16:37 I feel like I mean, this is kind of a secret tip. But when I get into an interview, and someone gives me a problem, I immediately this thing, hash tables or graphs, and Ed works like 90% of the time, you know, thanks. So yeah, obviously, I think graphs are really important. So chapter six, and seven are graphs. Another very useful tool, chapter eight, as another good tool, and it's so easy. Like, this might be the easiest chapter in the book. Because chapter eight is really like, to the dumbest thing you can think of, and it'll probably work. And that's what a greedy algorithm as, yeah, sometimes, you know, computers are fast. Sometimes the problems aren't that big, just suffered, simply write Exactly. Chapter nine as probably the one I'm most proud of, and the one that people have complimented me the most on, because it's dynamic programming. And it is this really hard way to solve things.

17:42 That's not the same as programming with a dynamic language versus static, which is something different, right?

17:48 Yeah, it's all about creating this 2d array and splitting a problem up into subproblems. And then using those solutions for those subproblems. For the bigger problem, it kind of, I feel like it just blows people's minds. It's just so hard to use. I mean, I spent maybe three or four months on this chapter alone. And I think it's really well written. And a lot of people have found that very useful. So that's chapter nine. And then chapter 10. I just wanted to put a little thing about machine learning in here, because k nearest neighbors is so easy to use. And it's so effective. And it was just, you know, you could read this chapter before bed, and you would know how to build a recommendation engine. Oh, that's great.

18:35 That's really cool. Yes. And then you finish out with saying, basically, now that you're inspired, you know, the foundational algorithms. Here's a whole bunch more things you can go do, right?

18:45 Yeah. And then there's 10 things in that chapter 11, that I think are really neat. And they're just, Oh, I wish I could have put all 10 of these in the book also, but I just gave a little tip for each one. Sure.

19:02 That'll be the follow on the second. Second version.

19:06 Yeah, yeah. So

19:06 one thing that I liked about your book is you have little short exercises, like a couple minutes, two or three minutes at the end of each section, each chapter that you can do to sort of test your thinking, and I found those to be really nice,

19:20 thank you. I've been reading this book called a book of abstract algebra. And it's got a great model where all the chapters are like two pages, followed by six pages of exercises. I found that those exercises helped me so much. So I figured, you know, I need to put more exercises in my book.

19:38 So who did you have in mind, like if I have no programming, or very little programming experience with this book be useful if I had a computer science degree, would it still be useful?

19:47 So it's been really popular but boot camp students so far, I have a few friends who went to boot camps. And you know, I have a friend who's in a boot camp right now and he kind of told others Students in the camp about it. And they all seem to like it. I think it's also useful for people who have a computer science degree. So I work at Etsy, and I kind of, you know, just sent an email out, when the book came out saying like, hey, I've published a book, you know, isn't that cool? And I've had people email me, like, really senior people, you know, people who are like, two levels above me at Etsy. And they're like, wow, I finally I have great examples to explain these concepts. Because it's so hard to come up with a good example to explain something. Yeah, absolutely.

20:41 It's one thing to know it, it's another to teach it to another person. And

20:45 yeah, I

20:46 think it is really, I think it's really accessible in that way.

20:49 Thank you. It also doesn't require a lot of maths, which I like, you know, you don't need calculus or linear algebra or anything. It's really like, if you know, the basics of algebra, like if you know what a function is, you can read this book.

21:05 Yeah, I like that. You said that, you know, I was talking with a friend of mine, about sort of, what does it take to become a programmer, like how much math you need to know. And I think there's a outside of the programming field that people who are not programmers, it feel like, programming is very, very mathematical. Right? Like, if you don't know how to do calculus and differential equations, you'll never be a good programmer. And to be honest, I find I do very little actual advanced math in anything that I do, except for when I was working for like a scientific visualization company. But outside of that context, like it's more just knowing, like critical thinking and problem solving. And so I think that they give people feel like they need to learn a lot of math. I mean, math is not a bad thing, but I don't think it's required, do you?

21:51 I don't think it's required either. I think, especially for something like this, where like, algorithms are the first step to like, long journey in computer science. And that first step should be as easy as possible. There shouldn't be like, big hurdles to get to that first step.

22:10 Yeah. Totally agree. Totally agree. Okay, cool. So let's, let's talk about some of the algorithms and some of the things that you covered in the book, the very first one that you covered was sort of different types of searching, and sets and really leading towards binary search.

22:28 Yeah, just says, This is my favorite algorithms example, because anyone can understand it. Like, you don't even need to know anything about computers. But because a lot of people, you know, when they heard, I'm writing a book, they were they tell me what, what is an algorithm? So I give them two examples where I was like, you know, think about I'm thinking of the number between one and 100, take a guess. And they'd say, you know, 55, I'd say, well, that's too high, take another guess. And they'd say, 30. And that's interesting, because people automatically divide the space in half, when they guess, almost on instinct, like, maybe they don't even understand what they're doing. But they're running binary search in your mind. You know, that's all binary searches, divide the space in half every time to get to the results, or to find the results

23:26 as quickly as you can. Yeah, it's such a simple example. And yet, I think it really dramatically points out the power of taking that formalizing it just a little bit, and making it really fast.

23:52 Capital One has a special message for you. They need Python pros who love to work with data, put your Python experience at work a capital one and help them use data to make life better for millions of customers. And Capital One is employing the latest tools and approaches to do data analytics and data science from the ground up. They're smart, creative professionals love to explore new ways to interact with data. They're interested in figuring out novel, advanced Python techniques. And even more interested in finding more people who will help them do that. When you join their state of the art Python community. You'll work with people you really like people who might be listening this podcast right now. relentless innovation is their way of life. Make it yours at Capital One, visit jobs, capital one.com, slash talk Python to learn more and apply today. You had a cool example. He said, you know, imagine that you play this. You have a couple of ways that you could do it. Like you could just say like I've got a list of 100 things and you want to search for something in there, right? Kind of like the example you just gave. One option is to look at the first one. Check it Next one. Just Just go through it in order, right? Another way is apply this binary search thing, assuming that set is somehow ordered, like, like an address book or something. And you said, Well, for 100 items, if you go straight through it, you know, worst case you have to go, you'll certainly get to the end as 100. But if use the binary search, because you have it in half and half it worst case scenario, you get seven. But then you said, Well, let's think about it 4 billion.

25:23 Yeah. And it's so crazy, because now the difference, you know, now it's really obvious like for searching through 4 billion items is too much. But with binary search, it's just 30 to a

25:37 maximum of 32 guesses, right? Even if it's I'm guessing between a number between zero and 4 billion.

25:42 Yeah. Isn't that such a huge difference? Yeah,

25:46 it's, it's insane. I mean, it makes sense. But at the same time, you don't think such a simple idea is going to reduce it from 4 billion comparisons to 32. At worst case? Yeah. It's really amazing. Yeah, yeah. So one of the things I think, is hard for people who have not gone through a formal computer science background, and just for everybody listening, I don't have a computer science degree, but I have a minor in computer science. So I've gone through some of it. So I guess I'm somewhere in the middle of this. But if you were self taught, if you've gone to a boot camp, or if you're just really new. People often talk about big O notation around performance of algorithms. And I think that that's kind of mysterious to people. And it also seems to be something that ends up on job interviews often. And so if you don't have that experience, you're like, I don't even know what digo is, like, Well, sorry, you're out or something right? Like it, it can be a bigger problem than it maybe really deserves to be, but I think it's worth knowing big O notation for a couple of reasons. Yeah,

26:48 dude,

26:49 this was a really good way for you to introduce that concept. I thought, like, pretty crazy, right?

26:53 Yeah, exactly. And I won't, you know, get into the full explanation here. But I do want to say, I think every beginner engineer I meet has trouble with big O notation. So you know, if some of your listeners are still needs engineering, just to say like, that's not just you don't feel that. That's right. Yeah. But that's why I, you know, it's right up front in the book, and then I talk about it again, in chapter four. So I spend a lot of time trying to explain big O notation in this book.

27:29 Yeah, absolutely. So I think it's pretty interesting. You've got the the linear search, which is what they call a O of n. So as if you have n items, you have to do n comparisons, if you have two n items you have to do to in comparisons, it grows basically linearly. But this binary search, one is a log of n, which doesn't sound like that big of a difference to realize it's 4 billion versus 32. Which is pretty amazing. So knowing this, this relative scale, and that doesn't actually tell you how fast it is, does it that just tells you like relatively, how much slower does it get as you get more data?

28:08 Exactly? Because everyone's computer, you know, calculates at a different speed. So you can put a time on it.

28:15 Yeah. And that becomes interesting later, when you find algorithms that are actually look worse in bigo. But often, they're not. And so you talked about this idea of average time versus worst case time. Think that's also important to understand.

28:32 Yeah, that's another really interesting one there. You know, if you can say that your algorithm is going to take a short time on average, and a really long time. Worst case, maybe that's fine. You know, if you're just a website, or you know, you have a basic consumer app, and you're like, well, it'll run fast most of the time, so that's fine. If you're a NASA, and you have to guarantee a certain time, then you really care about that worst case time also,

29:05 yeah, any any real time system. So if you were doing like a flight control system on a spaceship, or if you're doing trading in some sort of a high speed trading system, or I worked on a system that would actually analyze eye tracking data in real time, and it would get 250 samples per second. And if it couldn't process, using some very advanced sort of wavelet decomposition algorithms or whatnot. If you couldn't process that in, you know, four milliseconds, well, then it just couldn't keep up because that was how fast data was coming. Right? I mean, there's there's these situations where, worst case time maybe is super important. But a lot of times, like you said, average time, I think is fine. So give us some like for some algorithms that we might know, give us some bigo performance stats.

29:54 Sure. So binary search, we already talked about log n Searching, you know, linear search, looking at one item at a time is big O of n. And like we just talked about, that's a big difference, right? log n versus n. So again, if you think about that, a slow sorting algorithm is n squared. And my example is selection sort of fast sorting algorithm is going to be n log n. So again, you have that log n versus n difference. So the fast sorting algorithm is much faster. Yeah. And after that, there is like, n cubed algorithms. My extreme example, as you can get big O of n factorial algorithms. And, you know, if people don't know what a factorial is, that's like, five factorial would be one times two times three times four times five, six factorial would be 123456 factorial grows really fast. If you haven't n factorial algorithm, that just means you just can't use it most of the time. It only works on extremely small data sets, right? Yeah. Yeah. So the example probably the canonical example for that is the traveling salesperson problem. Yep. And this is one of those NP complete problems we've talked about, where, you know, the problem is really simple. You're a traveling salesman, and you have a list of cities that you want to travel to. And you want to figure out the shortest route that heads out of those cities. So it just seems so simple, like, you know, why couldn't she calculate that? And of course, you can, it means that you have to come up with every permutation of cities of the order of cities, which if you have six cities, it's six factorial permutations. If you have 100 cities, it's 100 factorial permutations. And just to give you an example of how crazy that is, six factorial is 720, which, you know, your computer can do 100 factorial as nine by 159, followed by 157. zeros.

32:14 Yeah, that means you're going to run out of time, are you going to run out of memory? Plenty of those first, but you're probably going to run out of something, right?

32:21 Yeah, I think the universal and free can make. Absolutely,

32:28 I so the next thing, though, you talked about that I thought was cool was selection sort. And you took a moment to say like, let's think about the two data structures that hold stuff. And just like a list style, and that was linked lists and arrays. And I thought that was a really interesting trade off, knowing how you're going to use them, and so on. So tell us about that. That comparison you made?

32:54 Sure. So the example I use in the book is this idea of you know, you're going to watch a movie. And let's say you you're going there, there's eight, a few, eight people are going and you're trying to find seats. So maybe you can find eight seats all together. And then you can all sit together, maybe there is no set of eight seats together. So you have to kind of set all over the Tater. And you know, bear the neck, you know, you know very each other. You know where your group is, but they're not all in one place. And that's this idea of linkless versus arrays where arrays, you're all sitting together linkless You're all sitting separately, you know your data is together in memory or a part two with arrays, you basically have a contiguous block.

33:43 And exactly in width linked lists, you each element knows how to find the next element. Sometimes you have doubly linked lists. So you can start at the back and forward or forwards and backwards. But each element is more or less in charge of going to a new memory location to find the next or being the end.

34:00 Exactly. And if I could stretch this movie analogy a little bit, let's say you know, you have a bag of popcorn, and you're kind of passing it down the road. Really easy to do if you're all sitting together, right? Because you can just you know, the next person is just to your right, and you just keep passing them back to your rate. And that's what arrays are. So it's really easy to access the next element in an array. And it's easy to say like, you know, I want to find the fifth item in my array. Because everyone's together, you can just do the maths like zero plus five equals five linklist. It's like, now you're passing those bags of popcorn around, you have to go to the next person in the movie theater. And then they have to go to the next person. It's a little more arduous. And if you want to find the fifth person, you can't just go directly to that person. You have to go to the first one. The first one has to go to the second one. Second one has to go to the third one. So you have to, like follow this link these links down.

35:04 Yeah, that makes perfect sense. So I mostly if I think the data structures I use, I mostly use arrays, so lists, and dictionaries. And then sometimes I want to sync stuff. So sets, I don't find myself using linked lists so often, but they do have some interesting trade offs. Like, when is an array good versus when is a link less good.

35:25 So again, going back to this movie example, sometimes you go to the theater, and you just don't have eight seats together, right? Like, sometimes you just can't fit an array in memory, because you don't have you know, if your array, if the array you want to create is too big, you just don't have space for it are what can be bad also, as let's say, eight, a few sits down at the theater, and you found eight seats, everything's great. And now, another person shows up. So now you have nine people, but there's no space for a ninth person. So now you have to all get up and go around trying to find that ninth seat. So you know, similarly, when you want to add elements to an array, let's say allocated memory for 100 elements. And now you want to increase the size of your array to 200? Well, it's going to be a lot of work to move those all those items to a different part of memory. You know, that's bad performance has been a big plus, if I want to insert one in the middle,

36:34 something like that, right?

36:35 Exactly. Like it's hard to move all those items. But at the link list, you can just put them somewhere and just change the links around.

36:42 Yeah. So you talked about the big O performance of both of those. And basically inserts for lists are oh one, so constant, like super fast. But read like random access is order in which is not so great. But it's almost the reverse for arrays, right? Random Access is just instant, more or less. But adding something grows as you add more items, right? Because you got a copy and reallocate and all that. So they're almost like a counterpart like opposites in some way from a performance trade off.

37:17 Exactly. And that's kind of you know, you hear all that, and you start thinking, gosh, I wish I had something that was as good as arrays for reads. And as good as linkless for inserts. And that's when you start getting into like the more complex data structure,

37:35 right, absolutely. So another thing that you covered that I recall, like this, this is a burn a spot into my brain from what I learned it is recursion. And I just remember recursion, like blowing my mind when I first thought of problem solving with recursion.

37:53 Oh my gosh, yeah, it's this is another one that's so hard for people to start thinking about it. Because it's just, you know, a function calling itself. That's just seems crazy. But that's why I have a lot of examples about recursion. And I have a lot of exercises. And I kind of tried to break it down. So people understand the structure of recursive solution. Even if you never plan to use recursion. In a problem, there are plenty of algorithms that other people have created that use recursion. So it's, if you want to understand those algorithms, you need to know what recursion is. Yeah, absolutely. And there are times when you can't solve the problem without recursion. But the data you're trying to understand is so perfectly lined up for recursion that the solution is just dead simple. If you realize that that's something in your toolbox, right, like, tree like depth first sort of tree type processing and things like that. Exactly. Yeah.

38:55 So one thing that you you had at the beginning was this example with boxes, and you have like a little box story in the attic for loops and recursions.

39:05 Yes, this is so the tie example where you know, you're going to your grandma's attic. And you're looking for this key and she has so many boxes and it could be in long these boxes. So you open the box, and then you see more boxes inside that box. So now you can think about, you know, there's two ways you could find this key you could kind of keep this list of boxes, right. So like, you open a box, you see more boxes, and you just add them to your pile of boxes to check and you the algorithm you're running as you pick up a box from the Pio, look for the key if you see some boxes he added to the Pio and until you find the key you just grab another box from the Pio and check it for the key That's the while loop approach, right? Because while you don't have the key, go to the pile, pick up a box, search for the key. And the recursive approach would be open a box. If there is a key you're done, if there is a box, open the box, if there's a key you're done, if there's a box, open the box, you know, yeah, it's got this beautiful, very simple quality to me where you can express it in two lanes like S key. Done, else. Keep going.

40:35 Doing it box, just open the box. And yeah, I just got this very natural way of solving the problem, doesn't it? That's cool. Yeah.

40:58 We all love Python first, tremendous productivity benefits. But getting the best performance takes some work. What if you could get out of the box easy access to high performance Python and Intel distribution for Python developers delivers just that, get close to 100 times better performance for certain functions. When using NumPy. Sai pi, psychic learn, linked with optimized native libraries like Intel math, kernal, library access efficient multi threading in Python projects like number and scythe on try the Intel distribution for Python and experience performance today at talk Python dot f m, slash Intel in profile your Python and native c C++ applications for performance hotspots with Intel v tune amplifier. with Intel, it's all about performance.

41:49 Like how you got some nice pictures in it, like the pictures are even simpler, you know, just feels it feels really great. The other thing that was interesting was, in your first example, you talked about having this list of boxes and you put the stuff in a list and you take the stuff out of the list. In the recursion example, you don't have anything that is the storage of where you are, or what box you're working on, or anything like that, like, where How do you keep track of the boxes?

42:18 So that's a really interesting part of recursion, where you're kind of making the computer do the work for you. Right? Because you call, let's say you call the look for key function. So you've called that once, and the computer has that, you know, that information noted? Okay, he's called look for key ones. And then he call it again, inside look for key. So the computer says, okay, that's the second call to look for key function. And then you call it again and says, okay, that's the third call. So it's kind of keeping track of those calls for you. And those function calls as your array. Basically, you're keeping track of all the boxes, you have to check through that array of function calls, but the computer is doing it all for you behind the scenes, you don't even have to think about it.

43:14 Yeah, absolutely. It's just, it's the way programs execute. They just your language is taken advantage of that it lines that up for you. That's cool. So this whole recursion thing is more or less, is very good at solving this kind of divide and conquer inductive problem, if you can talk about some kind of base case. And you can talk about, well, how do I take like one step away from there, you can probably apply recursion.

43:39 Yep, I'm reading this book called How to solve it. It's a famous maths book about like solving hard problems. And I love one of the parts of this book, he says, If you come across a problem you can't solve, change it into a problem that he can solve and solve that problem instead. Isn't I mean, that's so easy. And that's, it's the same thing with divide and conquer where it's really hard to solve this problem. But I'm going to just take it down to the smallest component that I can solve, and use the solution for that to solve this bigger problem.

44:14 Yeah, can it gives you a foothold on on climbing the solution or whatever? Yeah. So you have two examples of divide and conquer that you gave in, in this area?

44:24 Yeah. And I'm going to talk about the quicksort one, because it's so it's so elegant. Yeah, it's a great example of divide and conquer. And again, such a simple idea. You have an array of elements that you want to start, but you don't know how to start an array, though. What's an array you can start? How about if the array had zero elements, that's pretty easy to start. It's just there's nothing to sort sorted. It's sort of similarly if you have an array, but one element, pretty easy. If you have an array with two elements, it's so you know, you just check which one's bigger and put it at So all of these are easy examples. Now you get to an array with three elements. And quicksort says, Just pick an element from the array. So it doesn't matter which one. So I'll just pick the first one. So let's say your array is 537. So I pick five as my, it's called the pivot element. And now I look at the rest of the elements in the array. So three, and seven. And I know that three is less than five. And I know that seven is greater than five. So now I have these two sub arrays, right of elements less than the pivot, and elements greater than the pivot. And now I just call quicksort, again, on those two arrays separately, so I call quicksort, I have this array that only has the element three, and we know how to solve, we know how to start an array with one element, and it's just three, and then you have the pivot, because that's the number greater, you know, we know that that's greater than three, and then you call quicksort. On the second array, which has the seven. And again, it's just one element, we know how to start that. So you end up with these three sub arrays, one with just a three, one with just a five, and one with just a seven, you just smush them all together, and you have a sorted array. So using just this knowledge of how to start an array with 01 or two elements, you started an array with three elements. And now that he can do that, you can sort an array with four or five elements. And you can kind of start any array you want, just by solving that small

46:40 problem. Yeah, you just continue to break it down, even if you have a million right and exactly Nice. Yeah, quicksort is lovely in the history of quicksort. It's pretty interesting. So another thing, the next thing in your book that you talked about is, is one of my favorite data structures. I don't necessarily use it the most, but when I do use it, it's so awesome. And that's hash tables or dictionaries, right?

47:02 Yeah. I mean, this is one of the reasons I love JavaScript. As you know, JavaScript objects are just hash tables. So hash tables are such a big part of JavaScript. And it's, I mean, like I said, I feel like almost any problem, I could just, you know, if I just want the quick and dirty solution, I can just do the hash table and call it done.

47:24 Yeah. So you had a really nice example of a checkout person.

47:28 Oh, yeah. The Maggie

47:30 the Maggie. Yes. You need a Maggie, how much is an avocado? It's $1 49. Thank you, Maggie.

47:34 Yeah. Isn't I mean, that's exactly what a hash table is where you can either look up prices in this book, and it kind of take some time. Or you just have a person there who has it all memorized. They just say you know, it's 67 cents. Thank you. Maggie is my wife's name. And I feel like she has so much smarter than I am. So I knew I needed to make her a character in this book. Oh, that's a nice. That's a nice touch.

48:02 Yeah, very cool. Yeah. So if, if people want to get a sense of like how powerful hash tables or dictionaries are, I just last week wrote a search engine. So people could search every single bit of content of all the podcast episodes. So if you go to talk python.fm, there's like a little search thing in the top right. And you can click it, and you can type in complex searches. And it'll find Basically, anything that matches all those keywords. And the way it works is, it goes through all the transcripts, it goes through all of the show notes, it goes through all the titles, all those various things and a few others in it, it turns it into a bunch of keywords and turns that into a dictionary. And then for each keyword, it finds it figures out if there's a piece that matches, you know what pieces match this keyword and puts that in there. And if you go there, you know, it's like 80 hours of conversation plus some other stuff. And you can type in a keyword hit Enter, and it runs in sub millisecond time. 100% in Python, you know, so you, yeah, you can search for like, five, five minute the things that contain these five words, across 80 hours of conversation, point one milliseconds. 100% Beautiful. That is like it think of if that was trying to, you know, regular expression, the text, or it was trying to like, you know, literally search it or whatever, right? Like it would be insane. You just got it's too slow. But yeah, it's so like, things like that are just so possible to dictionaries, and they make me happy. Cool. So let's talk about some of the other algorithms and we're kind of getting short on time. So maybe just sort of skip over them. Just touch on them a bit.

49:41 Sure. Again, chapters five and six are super useful to me, you know, hash hash tables and graphs. And a graph is this really simple idea where you model a problem using nodes and edges. So my example, as a You're trying to get from Twin Peaks to the Golden Gate Bridge. And this is how you can tell that I live in San Francisco. And you're trying to figure out what is the least number of bus transfer that to do to get to from Twin Peaks to the Golden Gate Bridge. And so you can model that, you can model it using a graph, where you have one node, which is the Golden Gate, which has twin peaks. And then he kind of puts out edges, which are all the different bus routes you can take to the next part, to the next transfer stop. And then that one puts out edges of all the buses, he can take from that barn to the next transfer stop, and so on until we had the Golden Gate Bridge. And it said, This is a classic, it's called the shortest path problem. Another example would be, you're on Facebook, and you're trying to find you really want to talk to I don't know, someone famous, Brad Pitt, for example, as we were trying to figure out, what is the shortest number of connections to Brad Pitt? Like,

51:04 you know, what is the least number of people which one of your friends could introduce you? indirectly? Something like this? Right, exactly.

51:10 I mean, LinkedIn does this where they say like you're, you know, so many connections away from this person. And it's just graphs. It's a graph problem. Nice.

51:18 Okay. And then you talk about greedy algorithms and your dynamic programming. Yep. And again, greedy algorithms

51:25 are so simple. It's just do the simplest thing you can do. So you know, my example is, you are a thief. This is the classic knapsack problem. So you're a thief and a department store and you have a knapsack. And you're trying to figure out, what items Can I steal, to get the maximum value to steal the maximum value of items. And different items have different values, but there's only so much space in your knapsack? So the greedy approach says, pick the most expensive items that will fit in your knapsack, and put it in there and then seal the next most expensive item that will fit and keep going until you've filled your knapsack and it doesn't give you a perfect solution. But it gives you a good enough solution where it's good enough for me. Yes,

52:15 this optimization problem. Interesting. Yeah. And then recommendations that with K nearest neighbors,

52:21 Oh, yes. Again, really simple concepts. My example is, let's say you are Netflix user and Netflix is trying to recommend movies to you. And they know that you have I love the matrix, for example. So they know that I've rated the matrix, five stars on Netflix and a bunch of other movies. So they look for other users that have similar ratings on those movies. So they might say like, you rated the matrix five stars. It looks like Keanu Reeves rated the matrix five stars also. And then you rated at an 100 lens element Dalmatians five stars, and Keanu Reeves rated 101 Dalmatians five stars also. So it seems like the two of you have a common tastes and movies. We're just gonna look at what other movies Keanu likes that you haven't seen. And we'll just recommend them to you, because you probably will like them also. Yeah, that's cool.

53:19 That's a pretty simple recommendation engine. But I recall a few years ago that Netflix had like a challenge to the community to build the best recommendation engine, and they had like a million dollar prize or something big like that. Right? Do you remember? Do you? Yes,

53:34 I did. And I think I think they used to use k nearest neighbors. I'm not 100% sure. But I think they use k nearest neighbors before that price came out. So that's how, you know, it worked for them for so long. And I think the current version uses a modified version of carriers. Yeah, I

53:54 don't see how you get around something like this been at least part of the solution. Right.

53:58 Yeah. Awesome.

53:59 I think that's, that's quite a good Introduction to Algorithms. And, you know, if you're out there listening, and you didn't have a formal computer science education, or like me, you kind of paid attention and you forgot, and these ideas were living at the edge of your memory. But you wouldn't mind a reminder. I think this is a really interesting way to learn it with this nice illustrations and simple stories. I really appreciate your book. I think we're probably out of time to have to leave it there. Let me ask you a couple of questions. Before I let you go, though, I always ask my guests. Sure. Yeah, there's now over 90,000 PPI packages out there distinct packages. I'm sure you come across some of the found interesting that maybe not everybody knows of anything come to mind. You want to recommend oh my gosh, I'm sure everyone knows about the one I'm going to recommend but it's called NumPy.

54:47 Oh, yeah. And it's not I'm starting to get more into machine learning. And it's so useful.

54:55 Yeah, it's I think the whole thing's like NumPy and sci fi and the whole Data Science story has really opened up a whole new avenue for Python to grow, right? It's not just a web development technology. It's also for for so much for science, and it's amazing what people are doing with it. editor, if you're going to write some Python code, what editor do you use?

55:17 I have to go over to them. All right, then.

55:20 Right on.

55:20 Okay, so

55:21 any final call to action? How do people find your book, things like that.

55:25 So if they just go to my website, it's audits.io. At I t.io. There's a link to my book. And there's blog posts there. Oh, you said that people can get the pictures and use them? Like for their classes, if they're a teacher or something? Yes, that's something not a lot of people know. But all the images from the book are available for free, online and high res. So if you're a teacher, and he want, you know, more images for related to algorithms, there's like 400 images from this book, and they're all on my GitHub. So it's github.com slash Egon chalet. And maybe you can add a link, so I don't have to spell that out. Yeah, I'll

56:11 definitely link to that. No problem. That'll be in the show notes. All right. Well, it's it's been great to talk to you and I definitely recommend your book to people. I think it's it's very approachable. So if this kind of thing is interesting to you, check it out.

56:23 Thank you so much.

56:24 You're welcome. Thanks for being on the show. Talk to you later.

56:26 Take care. This

56:28 has been another episode of talk Python to me. Today's guest has been audit Bhargava. And this episode has been sponsored by Capital One and Intel. Thank you both for supporting the show. Are you a data scientist or Python developer who loves data. If you're looking for a place to work on data science with truly big data that can affect millions of lives that head on over to jobs, capital one.com slash talk Python, and check out the wide range of jobs that Capital One is trying to fill right now. The Intel distribution for Python delivers the high performance Intel c libraries built right into Python key close to 100 times better performance for certain functions when using NumPy, sci fi and psychic learn. Check them out at talk Python, FM slash Intel. Or you are a colleague trying to learn Python. Have you tried books and videos that just left you bored by covering topics point by point, well check out my online course Python jumpstart by building 10 apps at talk python.fm slash course to experience a more engaging way to learn Python. And if you're looking for something a little more advanced, try my write pythonic code course at talk python.fm slash pythonic. And you can find the links from this episode at talk Python FM slash episodes slash show slash 82. Be sure to subscribe to the show open your favorite podcatcher and search for Python we should be right at the top. You can also find the iTunes feed at slash iTunes, Google Play feed at slash play in direct RSS feed at slash RSS on talk python.fm. Our theme music is developers developers by Cory Smith Goes by some mix. Corey just recently started selling his tracks on iTunes. So I recommend you check it out at talk python.fm slash music. You can browse his tracks he has for sale on iTunes and listen to the full length version of the theme song. This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Let's mix. Let's get out of here. Dealing with

58:23 my boys.

58:26 Having been sleeping. I've been using lots of rest.

58:38 Developers, developers developers

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon