Learn Python with Talk Python's 270 hours of courses

#11: PyImageSearch and Computer Vision Transcript

Recorded on Wednesday, May 20, 2015.

00:00 Does a computer see in color or black and white?

00:02 It's time to find out on episode 11 of Talk Python to Me with our guest, Adrian Rosebrock,

00:08 recorded Thursday, May 20th, 2015.

00:12 Welcome to Talk Python to Me.

00:42 A weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

00:47 This is your host, Michael Kennedy.

00:48 Follow me on Twitter where I'm @mkennedy and keep up with the show and listen to past episodes

00:54 at talkpythontome.com.

00:55 This episode, we'll be talking with Adrian Rosebrock about computer vision, OpenCV, and PyImage Search.

01:03 Hello, everyone.

01:05 I have a bunch of cool news and announcements for you this week.

01:08 First, this show on PyImage Search is a listener-suggested show.

01:13 Thank you to J.I. Lorenzetti for reaching out to me and suggesting this topic.

01:18 You can find his contact details in the show notes.

01:21 As always, I'm also excited to be able to tell you that this episode is brought to you by CodeShip.

01:27 CodeShip is a platform for continuous integration and continuous delivery as a service.

01:33 Please take a moment and check them out at codeship.com or follow them on Twitter where they're at codeship.

01:38 Did you know that most of our shows come with full transcripts and a cool little search filter feature?

01:43 If you're looking for something you hear in an episode, just click the full transcript button on the episode page

01:49 and search for it.

01:49 Also, I want to say thank you to everyone who has been participating in the conversation on Twitter

01:55 where we're at Talk Python.

01:57 It's a great feeling to see all the feedback and thoughts every week when we release a new show.

02:01 But if you have something more nuanced to say that doesn't fit in 140 characters

02:06 or you want it to be more permanent than Twitter, every episode page has a Discus comment section at the bottom.

02:13 I encourage you to post your thoughts there.

02:15 This week, I ran across a really awesome GitHub project called Python-Patterns.

02:20 You can find it at github.com/f-a-i-f slash python dash patterns.

02:26 It's a collection of really crisp design patterns implemented in a Pythonic manner.

02:31 For example, you'll find patterns such as the adapter, builder, chain, decorator, facade,

02:37 and flyweight patterns, just to name a few.

02:39 It's really extensive and pretty cool.

02:41 I think you'll learn something if you check it out.

02:42 Finally, I put together a cool YouTube playlist.

02:45 This is a series of nine lectures from Dr. Philip Gao, a professor at the University of Rochester, New York.

02:53 Find him on Twitter where he's at P-G-B-O-V-I-N-E, P-G-Bovine.

02:58 The video series is entitled C Internals, a 10-hour code walk through the Python interpreter source code.

03:05 You can find it at bit.ly.com slash cpythonwalk, all lowercase, no spaces.

03:12 Also, you'll find all these links in the show notes.

03:16 Now, let's get to the interview with Adrian.

03:18 Let me introduce Adrian.

03:23 Adrian Rosebrock is an author and blogger at pyimagesearch.com.

03:28 He has a PhD in computer science with a focus on computer vision and machine learning

03:32 and has been studying computer vision his entire adult life.

03:36 He has consulted for the National Cancer Institute to develop methods to predict breast cancer risks

03:41 using breast histology images and authored a book, Practical Python and OpenCV,

03:48 on utilizing Python and OpenCV to build real-world computer vision applications.

03:52 Adrian, welcome to the show.

03:56 Oh, thank you.

03:57 It's great to be here.

03:58 I'm very excited about computer vision and sort of merging the real world with computer science,

04:04 with robotics.

04:05 And I think there's just some really neat stuff going on.

04:07 And you're doing a very cool part in that.

04:10 Oh, thank you.

04:11 So we're going to talk about PyImageSearch.

04:13 We're going to talk about OpenCV and some of the challenges and even the future of these types of technologies.

04:18 But before we get there, you know, everyone's interested in how people got started in programming and Python.

04:22 What's your story?

04:23 I started programming when I was in high school.

04:27 I started out with the basics of HTML, JavaScript, CSS, did some basic programming.

04:33 And, you know, I'm probably getting a lot of hate mail about this.

04:36 But when I first started learning how to program, I did not like the Python programming language that much.

04:41 And this was around the early version 2 of Python.

04:45 I didn't like the syntax.

04:47 I didn't like the white space.

04:49 And for a long time, I was really, really put off by Python.

04:52 And that was a huge mistake on my part.

04:55 I don't know what was wrong with me back then.

04:57 I guess it was just high school ignorance or something.

04:59 But by the time I got to college, I started working in Python a lot more.

05:04 And that's especially true in the scientific area.

05:07 You see all these incredible packages in Python like NumPy and SciPy that just integrate with computer vision and machine learning.

05:15 And all other types of libraries.

05:17 And more and more people were transitioning over from languages like MATLAB to languages like Python.

05:25 And that's so cool.

05:26 And it really wasn't until college that I got into Python.

05:30 And I remember this one girl.

05:32 She was in my machine learning class.

05:34 And she had a sticker on the back of her laptop that said, Python will save the world.

05:38 I don't know how, but it will.

05:40 And that resonated with me.

05:42 I'm like, that sticker's true.

05:44 That is absolutely true.

05:46 It's such a great language.

05:48 So unfortunately, I did not have the best first experience with Python.

05:52 It took me four or five years later to actually come around.

05:56 But now that I'm here, I love it.

05:58 And I can't imagine programming in any other language.

06:01 It's almost a freeing feeling, a relaxing zen when you're coding in Python.

06:07 Yeah, that's a funny story.

06:09 It really is a wonderful language.

06:11 I also took a while to get there.

06:13 But looking back, I would have enjoyed being there sooner.

06:16 I went from MATLAB to C++ on Silicon Graphics machines.

06:21 So I had a bit of a torturous introduction.

06:23 But it was all good.

06:24 CodeShip is a hosted, continuous delivery service focused on speed, security, and customizability.

06:41 You can set up continuous integration in a matter of seconds and automatically deploy when your tests have passed.

06:46 CodeShip supports your GitHub and BitBucket projects.

06:50 You can get started with CodeShip's free plan today.

06:52 Should you decide to go with a premium plan, Talk Python listeners can save 20% off any plan for the next three months by using the code TALKPYTHON.

07:01 All caps, no spaces.

07:03 Check them out at CodeShip.com.

07:05 And tell them thanks for sponsoring the show on Twitter where they're at CodeShip.

07:10 So you're focused on computer vision and image processing.

07:24 Where did that story begin?

07:25 That story also started in high school.

07:29 Originally, I had this idea that I wanted to go work for Adobe.

07:33 And I wanted to work on developing Photoshop and Illustrator.

07:37 I love the idea of being able to write code that could analyze an image.

07:42 And for whatever reason, that just like really captured my imagination.

07:47 I could see these algorithms running in Photoshop.

07:49 And I was like, you know, what's really going on behind the scenes?

07:52 Like, how are they manipulating these images?

07:54 What does this code look like?

07:56 So for the longest time, I really wanted to develop these graphic editing applications.

08:02 But I didn't have the math experience.

08:07 This may surprise some people, given that I have a PhD in computer science.

08:11 But up until late high school, I did not do well in mathematics courses.

08:17 I got C's in algebra and geometry.

08:22 And it really wasn't until I kind of really put my back against the wall.

08:27 And I said, you know what?

08:28 I got to learn calculus and statistics.

08:30 So I did a self-study in AP Calc.

08:33 And I took AP statistics.

08:34 And I did well with those.

08:36 I'm like, man, math is fun now.

08:38 Like, I understand this.

08:40 So I got to college.

08:41 And I only took one computer vision course at the, because the school I went to didn't really

08:47 have a computer vision focus.

08:49 They had a wonderful machine learning focus, but not really a computer vision focus.

08:53 And what I found out was that, you know, you don't need a mathematical background to get

09:00 started in computer vision.

09:01 And I think this is true in a lot of areas of computer science, whether or not people want

09:05 to admit it.

09:06 A lot of people talk themselves out of getting started and stuff, especially challenging things,

09:11 because they're just scared of it.

09:13 They don't want to fail.

09:14 And that's the cool thing about Python.

09:17 Like, you almost don't have to worry about the code.

09:20 You get to focus on learning a new skill.

09:23 And for me, that was computer vision.

09:25 And that was the OpenCV library, an open source library that makes working with computer vision

09:30 a lot easier.

09:31 So again, it really wasn't until college that I really started to get into it.

09:36 getting interested.

09:37 Or not necessarily getting interested.

09:38 More so being able to take action on what I wanted to do.

09:42 Right.

09:43 Maybe, you know, it felt a little unattainable.

09:45 Like, I'm going to go be an engineer, but I don't know math.

09:48 And so there's no way I can do this.

09:49 But once you kind of got over that hump, then it was no big deal, right?

09:53 Right.

09:53 Yeah.

09:54 That's very freeing.

09:55 So you mentioned OpenCV and your project is PyImageSearch.

10:00 What's the relationship there?

10:01 So OpenCV, again, it's a computer vision library that makes working with images a lot easier.

10:08 You know, it abstracts the code that loads an image off of disk or does edge detection or

10:14 thresholding or any other simple image processing function like that.

10:19 allows you to actually build complicated computer vision programs.

10:23 You can do things like tracking objects and images or video streams, for example.

10:28 Detecting faces.

10:29 Recognizing whose face it is.

10:31 And OpenCV really facilitates this process.

10:35 And OpenCV really is like the de facto library for computer vision and image processing.

10:41 And you have bindings for it in countless languages.

10:44 The library itself is written in C and C++, but you can get bindings and access it in Java

10:52 and any of the .NET frameworks in Python.

10:57 So again, while you can access it in a programming language, I have this love for Python now.

11:03 And when I took the course, the computer vision course in college, I realized, man, like, people

11:10 are spending a lot of time writing their class projects in C and C++.

11:14 Why are they doing that?

11:15 Like, you're fighting over these weird compile time errors.

11:18 And, you know, you're not really learning anything.

11:21 And that's kind of the tenet behind PyImageSearch.

11:24 It's a blog that I run dedicated to teaching computer vision, image processing, and OpenCV using

11:30 the Python programming language.

11:31 That's great.

11:32 And I think that, you know, using Python seems like the perfect choice.

11:36 You're sort of orchestrating these high-level functions that are calling down into C++, doing

11:41 high-performance stuff, and then giving you the answer.

11:43 And that seems like the right way to be using Python.

11:46 So what's the actual package I use?

11:48 If I were to say pip install something, what do I type to get started?

11:51 So unfortunately, OpenCV is not pip installable.

11:56 I wish it was, but it is not.

11:58 And it is not the easiest package to get installed on your system.

12:04 If you're using Ubuntu or any Debian-based operating system, you technically can do an app git install.

12:14 But that's going to pull down a previous version of OpenCV.

12:18 You're going to run into a lot of problems with the Python bindings, and it's not a very

12:22 good experience.

12:22 So what you actually have to do is compile it from source, download the code from their

12:28 GitHub or the SourceForge account, and manually compile it and install it.

12:32 And in fact, that's really the only way to do it if you're interested in using virtual environments,

12:37 which, as most Python developers are interested in, sequestering their packages.

12:45 Yeah, absolutely.

12:45 Okay.

12:46 So I go and I download that.

12:47 And then what packages are in there that I would work with?

12:51 Is that CV2?

12:53 Is that the one I would import?

12:55 Yep.

12:55 So if you were to open up your favorite editor, you would just type in import CV2, and I'll

13:01 give you access to all of your OpenCV bindings.

13:04 Okay, great.

13:04 And now I looked at some samples on your blog about how I might go and grab like an image

13:11 from a camera hooked to an Adreno.

13:14 Maybe we could talk a little bit about the type of hardware that you need and the spectrum

13:20 of devices you can interact with and that kind of stuff before we get into the more theoretical

13:24 bits.

13:24 Sure.

13:25 So that's kind of the cool thing about OpenCV is that it's meant to be run in real time.

13:31 So you can easily process video files, raw video streams without too much of a problem,

13:38 again, depending on the complexity of your algorithm.

13:41 And OpenCV is meant to run on a variety of different devices.

13:46 I personally develop applications on my MacBook, but I also own a Raspberry Pi and a camera module

13:54 for the Raspberry Pi.

13:55 And using OpenCV, I can access the Raspberry Pi video stream and then actually build like

14:02 a home surveillance system using nothing but OpenCV and a Raspberry Pi.

14:08 I built this one project where I had a Raspberry Pi camera mounted on my kitchen cabinets looking

14:14 over the front door of my apartment.

14:17 And it would detect motion, such as when you're opening the door and somebody's walking inside.

14:22 So once it detected motion, it would snap a photo of whoever was walking inside, try and identify

14:27 their face, and then it would take that screenshot or the screen capture and then upload it to my

14:33 personal Dropbox.

14:34 So I had like this real-time home surveillance system.

14:37 That was really, really cool to develop.

14:39 And again, like this is using simple hardware.

14:43 The Raspberry Pi is not a powerful machine, but you could still build some really cool computer

14:48 vision applications with it.

14:49 Yeah.

14:50 And it's cheap too, right?

14:51 Yeah.

14:52 The Pi itself is, I think, $35 and probably another $20 for the camera module.

14:58 Yeah.

14:59 That's really easy to get started.

15:01 So very cool.

15:02 Very cool.

15:03 How does computer vision work?

15:05 I mean, I have a little bit of a background in trying to identify things and images.

15:11 I worked at this place called eye tracking, E-Y-E, tracking, not I, the letter I, tracking.

15:18 And we did a lot of stuff with image recognition and detecting eyes.

15:22 And I know enough to know that it seems really hard, but how does it work?

15:27 So computer vision as a field is really just encompassing methods on acquiring, processing,

15:34 analyzing, and just understanding and interpreting the contents of an image.

15:38 For humans, this is really, really easy.

15:41 We see a picture of a cat, and we know, like, oh, that's a cat.

15:43 And we see a picture of a dog, and we obviously know that's a dog.

15:46 But a computer, it doesn't have a clue.

15:48 It just sees a bunch of pixels, just a big matrix of pixels.

15:53 And the challenging part, as you suggested, is writing code and creating these algorithms

15:58 that can understand the contents of an image.

16:00 You can't open up your Python source file and then write if statements that say,

16:06 if this pixel equals, you know, whatever RGB code, you know, then this is a cat, right?

16:12 If it equals this pixel value, then it's a dog.

16:15 Like, you can't do that.

16:16 So what happens is computer vision really leverages machine learning as well.

16:21 So we can take this data-driven approach and say, here's a ton of examples of a cat,

16:26 and here's a ton of examples of a dog.

16:28 Let's see how we can abstractly quantify and represent this huge image.

16:35 And just like a small, what they call feature vector.

16:37 It's a fancy academic way of saying a list of numbers.

16:40 I'm going to quantify this big 3,000 by 3,000 pixel image into a feature vector that's 128 numbers long.

16:49 And then I can compare them to each other.

16:50 I can rank them for similarity.

16:52 I can pass them to machine learning algorithms to actually classify them.

16:57 So the field of computer vision is very large.

17:00 And again, it spans so many different areas of processing and analyzing images.

17:06 But if we're talking strictly about classifying an image and detecting objects in an image,

17:10 then we're most likely leveraging some machine learning at some point.

17:14 Okay, and cool.

17:15 And when you say machine learning, is that like neural networks or what's going on back there?

17:21 The machine learning algorithm you would use really depends on your application.

17:25 Deep learning has gotten so much attention over the past few years.

17:31 And deep learning has its roots in neural networks.

17:34 So we see a lot of that.

17:36 You also see very simple machine learning methods like support vector machines, logistic aggression.

17:44 You see that a lot as well.

17:46 And these methods, while simple, they're actually – the bulk of the work is actually happening on describing the image itself, you know, quantifying it.

17:57 So if you have a really good quantification of an image, it's a lot easier for the machine learning algorithm to take that and perform the classification.

18:06 Right, sure.

18:07 And so how much of this exists in external libraries like Scikit-learn or OpenCV or something like this?

18:16 And how much of that is like I've got to create that system for myself when I'm getting started based on my application?

18:24 So OpenCV does include some machine learning components, but I really don't recommend that people use them just because they're a little finicky and they're not that fun to use.

18:35 And especially in the Python ecosystem, you have Scikit-learn.

18:38 So you should be defaulting to that.

18:42 And to give an example, I wrote my entire dissertation, gathered all the examples using OpenCV and Scikit-learn.

18:52 I took the results that OpenCV was giving me and I passed them on to the machine learning methods and Scikit-learn.

18:59 Right.

18:59 Oh, that sounds very useful.

19:01 I think a lot of the challenging aspects of getting started in something new like this, if you're not already involved in it, is just knowing what exists, what you can reuse, and what you have to write yourself.

19:12 So knowing that that's out there is really nice.

19:14 Yeah, for sure.

19:15 And some of these algorithms you definitely don't want to be implementing yourself.

19:20 No, I'm sure you don't.

19:22 Unless you're really, really into high-performance matrix multiplication and other types of processing that make your day, right?

19:29 Exactly.

19:31 I have some sort of mental models of how I might use computer vision.

19:34 And then you have the Hollywood models, right?

19:37 Like Minority Report and so on.

19:39 But what's the current state of the art?

19:41 Like where do you see computer vision really prominently being used in the world?

19:44 So computer vision is used in your everyday life, whether you realize it or not.

19:51 And it's kind of scary, but it's also kind of cool.

19:56 Back about a year ago, I was traveling back and forth between Maryland and Connecticut on the East Coast of the United States constantly for work-related activities.

20:07 And one day I was exhausted.

20:10 It was Friday.

20:12 I just really wanted to leave Maryland and get back up to Connecticut and sleep in my own bed and just pass out.

20:20 So I left to work a little early.

20:22 And I started tearing up 95.

20:24 It was a beautiful summer day.

20:26 Sunlight streaming down.

20:28 Had my windows open and the wind blowing in.

20:33 And if you've ever driven on 95, specifically on the East Coast of the United States, you know there's always just like a ton of traffic or just lots of construction to ruin your drive.

20:42 And for whatever reason this day, there was no traffic.

20:46 There was no construction.

20:47 And I was just flying down the road.

20:49 And I made excellent time getting home.

20:51 However, two weeks later, I get my mail.

20:55 And I noticed that there is a speeding citation addressed to me.

21:00 Apparently, I had passed one of the speeding cameras that was mounted along the side of the road.

21:05 It detected that my car was traveling above the posted speed limit.

21:09 It snapped a photo of my license plate.

21:11 And then it applied what's called automatic license plate recognition where it takes the image, automatically analyzes it, finds my license plate, and then looks it up in a database and mails me a ticket.

21:24 I was like, man, I've written code to do this.

21:28 I know exactly how this worked.

21:30 So, like, it was the only time where I had a smile on my face as I was writing out the $40 check or whatever it was.

21:38 I'm like, you guys got me.

21:39 And I know how you did it.

21:41 Yeah, exactly.

21:42 Like, your love for image recognition is slightly turned against you.

21:47 Just briefly there.

21:48 Just briefly.

21:49 You said you worked on this thing called ID My Pill.

21:53 What's that?

21:54 So, ID My Pill is an iPhone application and an API that allows you to identify your prescription pills in the snap of your phone.

22:04 So, the general idea is that we are a little too faithful in pharmacists and our doctors.

22:13 And not to say there's anything against that, but mistakes do happen.

22:16 And people do get hurt.

22:19 They get sick.

22:20 And some of them do die every year due to taking the wrong medication.

22:23 So, the idea behind ID My Pill is to validate your prescription pills.

22:27 And it's also a way for pharmacies, for healthcare providers to facilitate better care for their patients.

22:35 So, you just take your pills, snap a photo of them, and then computer vision algorithms are used to automatically analyze and recognize the pill.

22:43 That way, you can validate that, yes, this is the correct pill.

22:46 This is what it says on the pill bottle.

22:48 And I know what I'm taking is correct.

22:52 That sounds really useful.

22:53 What do you think the chances are, like, something along those lines could be automated?

22:58 So, as the pharmacists are, like, actually filling the prescription, you know, the computer knows what they're filling and it sees what they're putting into bottles.

23:05 Could it say, whoa, whoa, whoa, this does not look legit?

23:08 Yeah, I think it absolutely can be automated.

23:12 The current systems right now that pharmacies use, especially within hospitals, some of them are taking RFID chips.

23:20 So...

23:21 You know, when you take a pill bottle off the shelf, it's able to validate that you are taking the correct medication and filling it.

23:30 But again, that's not perfect.

23:32 And pills can get mixed up.

23:35 So, in a perfect world, what you end up doing is you would have that RFID mechanism in place.

23:40 And then, you know, you have a mounted camera looking down at their workstation, at their desk.

23:48 And then, you know, it validates the pills in real time.

23:51 And they get a nice little thumbs up on the screen or whatever heads up display that they have in front of them.

23:56 And they can continue filling the medication.

23:58 Yeah, that sounds really helpful.

24:00 So, I used a slightly less productive, contribute to society sort of computer vision last night.

24:06 I was out with some friends and my wife.

24:08 We were having some wine.

24:10 And there's this really cool iPhone app called Vivino.

24:13 I think I'm saying it right.

24:14 Yeah.

24:14 And you can just take a picture of a bottle, even if the background is all messy and there's people around and so on.

24:21 And it'll tell you what the ratings are, how much it should cost at, you know, sort of standard retail price.

24:27 And I was really impressed with that.

24:29 You know, there's a lot of interesting sort of consumer style uses as well, I think.

24:35 Oh, for sure.

24:36 Vivino is one of the major ones.

24:40 And another one that you can see computer vision used a lot in, and I don't think people really notice it, but they appreciate it, is within sports.

24:50 So, in America, if you're watching a football game, you'll notice that they have, like, a yellow line drawn across the field marking the, you know, the spot of the ball along with the first down marker.

25:04 And these lines are drawn using calibrated cameras.

25:08 So, computer vision is used to calibrate the cameras and then know where to actually draw that line on the broadcast of the game.

25:15 And then, similarly, you can use computer vision and machine learning in the back end and analyze games to determine, you know, what's the optimal strategy.

25:25 And this is actually done in Europe a lot in soccer games.

25:29 So, they'll detect how players are moving around, how the ball is passed back and forth.

25:34 And they can almost run these data mining algorithms to learn how you're going to beat the other team and try and learn their strategy.

25:41 It's pretty cool.

25:43 Yeah, that really is amazing.

25:45 I feel like these days watching sports on television is actually a better experience than going to them live a lot of times because of these types of things, right?

25:54 It's very clear.

25:55 Oh, look, they've got to go, like, a foot and a half forward and then they get the first down.

25:59 Otherwise, they're going to fail.

26:00 And, you know, you don't really quite get the same feel live, which is ironic.

26:04 Right.

26:04 It's almost that it used to be that you didn't get the full story unless you were there.

26:09 And now it's kind of flipped around.

26:10 Like, you get more than the full story if you're watching the game on TV.

26:14 You get all the detail that you could possibly want.

26:16 Yeah, that's right.

26:17 So, you know, that sort of leads into the whole story of augmented reality and stuff like that with, you know, the Microsoft HoloLens, Google Glass, and, you know, a bunch of iPhone apps and other mobile apps as well.

26:29 What kind of interesting stuff do you see out in that world?

26:31 I have not had a chance to play around with the HoloLens.

26:34 I have used the Google Glass and played around with that.

26:37 Most of the applications I see, again, this is just because of my work with ID mypyll, is medical related.

26:46 So, you'll see surgeons going into really, really long, you know, 10 plus hour surgeries that they perform these complex operations.

26:54 And they may need to look up some sort of reference material while they're doing this surgery.

27:00 And instead of having an assistant doing that for them, I mean, they could put on the Google Glass and have this information right there in front of them.

27:06 So, you see a lot of that.

27:08 And since the Glass has a camera, there's a lot of research focused on, especially within medicine, identifying various body parts as you're performing this surgery.

27:20 So, you can have this documented procedure pulled up in front of you as you're working.

27:24 There's no need to instruct the Google Glass to do it for you.

27:27 Yeah, that's pretty wild.

27:28 I can imagine calibrating some kind of augmented reality thing to the body that is there.

27:34 And you could almost see, like, the organs and stuff overlaid.

27:37 Yeah, absolutely.

27:38 Some really interesting new uses there.

27:41 Very cool.

27:43 What about things like the Google self-driving cars?

27:47 What role does image recognition and computer vision work there versus, say, GPS versus laser versus whatever else?

27:53 Do you know?

27:54 That remains to be seen.

27:56 I guess I am a little bit of a pessimist when it comes to computer vision being used for driving cars.

28:02 You maybe know too much about the little problems you're into, right?

28:06 Well, that's the thing is I don't believe computer vision itself will ever be adequate, again, by itself, for self-driving cars.

28:17 I think you need a lot more sensors.

28:20 I think you do need things like radar to help you out with that and help detect objects in front of you.

28:27 And that's a problem computer vision does face is when you take a photo of something, it is a 2D representation of a 3D world.

28:37 So, it makes it very hard to compute depth based off of that.

28:41 And now we have things like the Xbox 360 Kinect and stereo cameras where we can compute depth, which is making things like the self-driving cars more feasible.

28:51 But, again, for something like a self-driving car, it doesn't make sense to rely strictly on computer vision.

28:57 I think you want to incorporate as many sensors as you possibly can.

29:01 Yeah, I think that makes a lot of sense.

29:03 And it might make sense to actually have the road have things like RFID style stuff in it where the car can be sure it's between the lanes, at least on the major long part of the drives.

29:14 Oh, for sure.

29:15 And I think that's kind of the point I will say is just because you can use computer vision to solve something doesn't necessarily mean that you should.

29:25 There might be better solutions out there.

29:28 Yeah, sure.

29:29 You don't want to be the solution in search of a problem.

29:33 You want to be the solution to a problem.

29:37 Yeah, yeah.

29:39 There's a really good show I'd like to recommend to people, by the way, about this whole computers, driving cars, and that.

29:44 Nova did a show called The Great Robot Race.

29:48 And you can just Google The Great Robot Race or whatever.

29:51 And it's all about this DARPA competition that kind of preceded the whole Google self-driving cars and so on.

29:58 And it's like a two-hour really sort of technical documentary on, like, the problems these teams technically faced and stuff.

30:05 It was very cool.

30:06 So if you want to learn more, check that out.

30:07 Nice.

30:08 Yeah.

30:09 It's a really good sort of conceptual idea of what's going on.

30:11 But, you know, what do I do?

30:13 Like, what kind of code do I write as a Python developer to get started?

30:17 I mean, it's tough to talk about code on audio only.

30:21 So don't do too much.

30:22 But just, like, can you give me a sense of what kind of code I would write to maybe grab an image and ID something in it?

30:29 Yeah.

30:30 So let's take a fun example.

30:33 If you've ever used your smartphone to scan a document, you've basically taken, you've, like, set a piece of paper on your desk, held your phone above it, and snapped a photo.

30:44 And then the document's already scanned and stored as an image on your phone.

30:49 And you could email it to yourself or text it to someone.

30:53 What's really cool is that while those programs and those applications almost seem like magic, they're really, really simple to build.

31:02 So if I were to build a mobile document scanner, I would basically say, first up, capture the image.

31:08 So I'm going to read it from disk.

31:10 I'm going to read it from a camera sensor.

31:11 Nice.

31:12 I'm going to convert it to grayscale because color doesn't really matter.

31:18 I'm going to assume there's enough contrast between a document and a desk that I'll be able to detect it.

31:23 And I'm probably going to blur it, get rid of any type of high-frequency noise, just allowing me to focus more on the structural objects inside the image and less on the detail.

31:32 From there, I'll say, yeah, let's perform edge detection.

31:36 Let's find all the edges in the image.

31:39 And based on the outlines of the objects in the image, I'm going to take these, and I'm going to look for a rectangle.

31:47 And if you consider the geometry of a rectangle, it just has four points, four vertices.

31:53 So I'm going to loop over the largest regions in this image, and I'm going to find the largest one that has four vertices.

32:01 And I'm going to assume that's my document.

32:04 And then once I have the region of the image, I'm going to perform a perspective transform to give me this top-down, bird's-eye view of the document.

32:14 And from there, you've basically built yourself a document scanner.

32:19 You can apply OCR, optical character recognition to try and convert the text in the image to a string in Python.

32:29 Or you could just save the scanned document as a raw image.

32:33 That works, too.

32:34 It's really actually not a complicated system to build.

32:38 How do you deal with the slight imperfections of reality, like a crumpled receipt or document or something like that?

32:48 So to handle, as you suggested, like these slight imperfections, I would suggest for the document scanner example to do what's called contour approximation.

33:00 So, again, if a region of an image can be defined as a set of vertices, and due to the crumpling of the piece of paper, maybe I find a region of an image that has eight vertices or 12 vertices.

33:14 It's not a perfect rectangle.

33:16 It's kind of jagged in some places.

33:18 Well, what I can actually do is I can approximate that contour.

33:21 And I could basically do line splitting and try and reduce the number of points to form that contour.

33:28 And that's actually a very critical step in building a mobile document scanner because you're not going to find perfect rectangles in the real world just due to noise capturing the photo.

33:40 Even perspective, right?

33:42 Yeah, even perspective.

33:44 That can dramatically distort things.

33:45 Are there libraries out there that help you with that kind of stuff, or is that where you need to know some math?

33:49 You don't really need to know that much math at all for it.

33:52 It's all about gluing the right functions together at the right time.

33:56 And I think that's one of the harder points of learning computer vision and why I run the PyMid Search blogs.

34:01 I want to show people real-world solutions to problems using computer vision.

34:06 And when I do that, it's not really the actual code that matters.

34:12 It's learning why I'm applying certain functions at different times in the context of the end goal.

34:19 It's about taking these functions and gluing them together in a way that gives you a real solution.

34:23 Okay, that makes sense.

34:25 Sounds a little bit like the code variation of I open up Photoshop and there's a thousand things it does.

34:30 And individually, they're all simple, but I don't know how to go from a bad picture to a great picture with all of my little pieces, right?

34:38 It's kind of like that with the code libraries for image stuff, right?

34:43 For sure.

34:44 There's a couple of really cool apps that are coming to mind when you're talking about that.

34:48 One is called WordLens.

34:50 Do you know about WordLens?

34:51 Yeah, that's the one where you have your iPhone or your smartphone and you can hold it over a foreign language and it automatically translates it for you.

35:00 Is that right?

35:01 Yeah, that's right.

35:01 And it will basically replace a menu or a street sign with whatever language you want, English in my case, right?

35:08 That seems like a really cool use as well.

35:11 Another one that I liked is this thing called Peak.AR.

35:15 And if you're out hiking in the mountains, you can hold it up and it'll like highlight the various mountaintop names and how high they are and how far away they are using computer vision.

35:25 Very nice.

35:26 Yeah, that's pretty cool.

35:27 Let's talk about your blog and your book a little bit.

35:29 Like what kind of stuff do you have on there for people to go learn?

35:33 You have like a short course people can sign up for and I saw you also have like a video course that they could take as well.

35:39 Is that right?

35:39 So it's not a video course.

35:41 I have two small mini courses on the PyImageSearch blog.

35:46 One is like a 21-day crash course on building image search engines, which is my area of expertise within computer vision.

35:54 We're all familiar with going to Google and typing in our query and finding results.

35:59 And that's essentially what image search engines are.

36:03 Only instead of using text as our query, we're using images.

36:05 We're analyzing the image and then searching databases for images with similar visual content.

36:11 And again, this is without any knowledge of text associated with the image.

36:17 Right.

36:17 That's really cool.

36:18 You know, I think I first saw that with Google image search where I could take a picture of like the Sydney Bridge and it would say, hey, that's the Sydney Bridge.

36:25 But it seems like there might be more practical uses for that.

36:28 Oh, there definitely are.

36:32 I've used image search at sea in the National Cancer Institute, actually.

36:38 I worked as a consultant there for about six months and we developed methods to automatically analyze breast histology images for cancer risk factors.

36:49 So we would actually develop algorithms that would go in and automatically analyze these cellular structures.

36:56 And then based off of that, be able to kind of predict that person's cancer risk factor five years from now, 10 years from now, 15 years from now.

37:04 But you could also apply image search to that and say, you know, here's an example of these cellular structures.

37:10 Go and find the other cellular structures in this image data set that look like that.

37:15 And then you can compare them, compare their cancer risk factors and look for correlations.

37:20 So there's image search is a wonderful, wonderful field.

37:25 I personally love it.

37:27 Then again, I'm biased.

37:28 So there's an incredible number of uses for it.

37:32 And it's not all consumer facing.

37:34 It's also, you know, on the back end, on the business facing, on government and all other sectors.

37:41 Yeah, that's cool.

37:42 One company I saw, I can't remember where I saw them, but they were basically starting a company that would help you identify parts of like machines and cars and stuff.

37:52 And you're like, I have a part.

37:53 I know it's broken.

37:54 I have no idea what it is.

37:55 Take a picture of it.

37:56 And they'll say, oh, this is, you know, widget X and it costs $32.

37:59 Here's how you order it.

38:00 Oh, cool.

38:01 Yeah.

38:01 So, you know, it doesn't help people quite as much, but still very, very cool.

38:05 Tell us about your book.

38:06 What's the title?

38:07 So I have a book called Practical Python and OpenCV.

38:12 And it's just a really quick, really quick kickstart guide to like get you through learning the basics of computer vision and image processing.

38:20 It starts off with very rudimentary things like here's how you load an image off of a disk.

38:25 And here's how you detect edges in an image.

38:28 And then it goes on to show you how to detect faces and images in video streams and how to track objects in images and how to recognize handwritten characters.

38:38 So it really takes you from assuming you have very basic programming experience and no prior experience in computer vision all the way to, hey, look, I'm solving some real world problems using computer vision.

38:51 And again, it's a quick read and you can learn a ton about computer vision really quickly.

38:57 And I would also love to offer a discount to talk Python listeners if they want to go to pyimagesearch.com slash talk Python.

39:06 You can get 20% off of any of the books that I offer on that page.

39:11 And I'll keep that open for the next two weeks.

39:14 That's awesome.

39:15 Thanks for doing that.

39:15 Oh, thank you.

39:16 Thank you so much for having this opportunity.

39:19 This is great.

39:20 Yeah, so people should check out your book.

39:22 That's really cool.

39:22 Is it used for like any university classes or is it more not so academic, a little more like industry focused?

39:29 It is much more practical and hands on.

39:31 So it's not used that much in college level classrooms.

39:36 However, I have seen it used in Amsterdam and a few universities in the United States as well.

39:43 Yeah, that's really cool.

39:44 It's got to make you feel good.

39:46 It certainly does.

39:47 I'm happy that I can contribute and give back to a field that I love so much.

39:52 Yeah, that's awesome.

39:53 A lot of people build web apps and web apps don't have attached cameras.

39:58 Well, I guess you could get one from the browser, right?

40:00 You could do something with that if you could capture the people's browser.

40:02 But what are some like web scenarios rather than say iPhone or Raspberry Pi type scenarios for computer vision?

40:09 Let's say that we're going to build, you know, like a Twitter or Facebook style application

40:14 where a user has a profile picture.

40:16 And we're going to build the functionality to upload a profile picture and then set it.

40:21 Well, there's a naive way of doing it where you just kind of upload a profile picture

40:26 and maybe you ask the user to crop the image or maybe you just automatically assume that the user's face or their profile is directly in the center.

40:35 So you just automatically resize the image and crop it.

40:38 Those are viable solutions and there's nothing wrong with them.

40:41 But we could also use computer vision to create a much more fluid user experience.

40:45 So the user can upload a profile picture.

40:48 And in the background, we're running these computer vision algorithms.

40:52 So we can analyze the image and find the face and automatically crop that face from the image and set it as the user's profile picture.

41:00 And again, the goal here is making a much more seamless experience.

41:03 That's really cool.

41:04 I think, you know, people's expectations of consumer focused and even B2B type apps to some degree, you know, things like Slack and so on,

41:13 are there definitely have gone up in the last three or four years.

41:17 And, you know, I blame the iPhone, right?

41:19 We can no longer build just ugly, crummy apps that users are forced to use, right?

41:23 They expect a cool experience.

41:24 And something like that is just, you know, really nice touch.

41:27 Yeah, for sure.

41:27 And from a business perspective, I mean, making a good impression on your users is crucial.

41:32 So you want them to have this really, really great experience when using your app, even if it is something as simple as a profile picture.

41:40 I mean, things like those do matter from a user experience level.

41:44 Right.

41:44 If somebody goes, oh, my gosh, you wouldn't believe what happened.

41:47 I uploaded this and it made the best profile picture ever, right?

41:50 That could possibly be something, a little bit of word of mouth marketing.

41:53 That'd be cool.

41:54 Yeah.

41:54 And then getting back to the Facebook example and the Twitter example, if you've uploaded a picture to Facebook recently,

42:01 you've probably noticed that it can not only detect your face or other people's faces and images,

42:06 but it can recognize whose face that belongs to.

42:09 So these images are being automatically tagged for you.

42:12 And maybe that's creepy.

42:13 Maybe that's not, you know, kind of beside the point.

42:16 But again, it's these computer vision algorithms running behind the scenes.

42:19 Right.

42:20 That's really fantastic.

42:21 Have you played with Google Auto Awesome from Google Plus?

42:24 No, I haven't.

42:25 Oh, my gosh.

42:26 There's amazing stuff that happens.

42:27 I don't do much with Google Plus.

42:29 I mean, there's some people on it and I check it every now and then and it's okay and so on.

42:33 But, you know, Twitter is really my place.

42:35 But for images, Google Plus is crazy.

42:38 So it'll do things like take a picture.

42:41 If you have like three pictures of a group and in one picture, there's somebody frowning and there's another picture.

42:48 Someone else has their eyes closed.

42:49 But in the other picture, the closed eyes persons were open.

42:52 The other, you know, if you could combine them and they would all like look good, right?

42:57 You could take out the bad faces and put the good faces in from the other pictures.

43:00 And it'll do that automatically for you.

43:01 Wow.

43:03 So it'll like take a set of group pictures and make the best possible like unified group picture of all the different faces.

43:09 That's pretty crazy.

43:10 That is.

43:11 And other things like that as well, like focusing on faces and so on.

43:16 So I'm sure there's some really cool computer vision stuff going on there.

43:19 Yeah, I'll definitely have to check that out.

43:21 Yeah, you can just like turn, install Google Plus, turn on your iPhone and just tell it to auto backup.

43:26 And then like you'll get these notifications every now and then.

43:28 You have these crazy pictures.

43:29 Nice.

43:30 Yeah.

43:30 So suppose I'm new.

43:32 I'm pretty new to computer vision.

43:34 How do I get started?

43:35 So if you want to get started, the first thing I would suggest doing is installing OpenCV.

43:42 I mean, that's a pretty important prerequisite.

43:44 Another library that you might want to look into that is pip installable is Scikit Image.

43:50 And this library, I love OpenCV, but Scikit Image releases updates a lot, lot faster.

43:59 It's a smaller library, but it's very good at what it does.

44:02 And in my opinion, it stays a little bit closer to what the state of the art is in computer vision.

44:08 So if an academic paper is published and they implement some new algorithm, Scikit Image is much more likely to have that implementation than OpenCV will, simply because it's a smaller community.

44:21 But it's a faster moving community.

44:22 So they can release updates faster.

44:24 Okay.

44:25 Yeah, that sounds really cool.

44:26 And do I need hardware?

44:27 Do I need a Raspberry Pi and a camera kit or can I just use my webcam?

44:31 You know, you can use pretty much whatever you want.

44:35 I recommend just starting out with your laptop or your desktop system.

44:39 There's no reason that you need any special hardware.

44:41 And realistically, you don't even need a webcam unless you plan on processing video, real-time video.

44:48 And in the meantime, if you do, you could still go ahead and get started.

44:51 You could just work with pre-recorded videos.

44:54 Maybe you recorded from years ago from files you have lying around on your system.

44:59 There's no reason why you can't get started.

45:02 And I think that's what's really cool about computer vision is, you know, back in high school, I had this impression that computer vision and image processing was just magic going on behind the scenes.

45:12 But it's really not, and it is possible to learn these algorithms and understand them without having a degree in computer science or a focus in machine learning or, you know, tons of understanding of high-level mathematics.

45:25 Yeah, very cool.

45:26 Especially with things like OpenCV where you don't have to understand, like, the video encoding format and all that kind of stuff, right?

45:32 Yeah, that's taken care of for you behind the scenes.

45:35 All right, well, maybe you could tell everyone a few of your favorite packages or libraries out there that you like to use.

45:42 You know, everyone has their own flavor and things they've discovered and problems they're solving.

45:47 So, you got any favorites?

45:48 Yeah, so I mentioned two of my favorites already.

45:51 The first one is Scikit-learn for machine learning.

45:54 The other is Scikit-image for computer vision.

45:57 OpenCV is great, but again, not pip installable.

46:02 If you do decide to play around with computer vision, I would suggest that you check out the imutils package that I created.

46:10 It's basically a set of basic functions for image manipulations, such as resizing, rotation, translations, that can take a little bit of code to do in OpenCV, and then it just reduces it to a single function call inside the imutils package.

46:25 Yeah, that's great.

46:26 I'll put that in the show notes.

46:27 Yeah, it's definitely worth taking a look at.

46:30 The other fun one that I like to play around with doesn't have a ton of real-world application, but it's called Color Transfer.

46:38 And the idea is that, let's say you went to the beach at 1 p.m. in the afternoon, and you took this really great picture, but you wanted to make it look like it was taken at sunset.

46:51 So you would have this beautiful orange glow of the sun over top of the ocean.

46:55 Maybe you could create that effect in Photoshop, or you could just have an example image of a sunset and then algorithmically take the color space from the sunset image and then transfer it to the image you took.

47:10 So if you want to play around with that, that's in the color transfer package on PyPy.

47:15 Oh, that sounds really interesting.

47:16 Yeah, it's a lot of fun.

47:18 Not exactly the most real-world applicable package, but it's fun nonetheless.

47:23 Yeah, it sounds really fun.

47:24 Okay, any final shout-outs or things you want to tell the listeners?

47:30 Yeah, if you want to learn more about computer vision, definitely head over to pyimagesearch.com.

47:35 Check out the blog posts I have.

47:37 I have, at the time of this recording, at least 80 free blog posts that discuss computer vision and solving real-world problems.

47:45 I have two courses, mini courses that you can take and learn about computer vision.

47:49 And again, I have my book, Practical Python, an open CV that you can purchase and read through and learn computer vision and get up to speed really quickly.

47:59 And again, I'm offering that 20% off to all Talk Python podcast listeners at pyimagesearch.com slash Talk Python.

48:06 All right, that's really interesting.

48:09 I appreciate the offer.

48:10 And I think everyone out there should go and check out your blog.

48:14 You have some tutorials you can go look at, and they're really short.

48:18 And they both walk you through the code as well as maybe setting this up on a Raspberry Pi with a camera board and all those kinds of things.

48:24 So I found it really helpful.

48:25 And hopefully, if anyone wants to try it out, I encourage you to do so.

48:29 Cool.

48:29 Thank you so much for having me on the show.

48:31 This was a great experience.

48:32 Yeah, Adrian, it was great to talk to you.

48:34 And I'd love to bring a different perspective on the cool stuff you can do with Python.

48:39 So here's one more awesome thing that people can do if they want to try it, right?

48:44 Yep, for sure.

48:45 All right.

48:46 Thanks for being here.

48:46 Talk to you later.

48:47 See you.

48:48 Bye.

48:48 This has been another episode of Talk Python to Me.

48:52 Today's guest was Adrian Rosebrock.

48:55 And this episode has been sponsored by Codeship.

48:57 Please check them out at Codeship.com and thank them on Twitter via at Codeship.

49:03 Don't forget the discount code for listeners.

49:05 It's easy.

49:05 Talk Python.

49:06 All caps, no spaces.

49:09 Remember, you can find the links from the show at Talk Python To Me.com slash episodes slash show slash 11.

49:16 And if you want to support the show, please check out our Patreon campaign at Patreon.com slash mkennedy,

49:23 where you can contribute as little as $1 per episode.

49:26 Be sure to subscribe to the show.

49:28 Visit the website and choose subscribe in iTunes or grab the episode RSS feed and drop it into your favorite podcatcher.

49:36 You'll find both at the footer of every page.

49:38 This is your host, Michael Kennedy.

49:40 Thanks for listening.

49:41 Smix, take us out of here.

49:43 Stating with my voice.

49:46 There's no norm that I can feel within.

49:48 Haven't been sleeping.

49:49 I've been using lots of rest.

49:50 I'll pass the mic back to who rocked it best.

49:53 I'll pass the mic back to who rocked it best.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon