#73: Machine learning at the new Microsoft Transcript
00:00 In this episode we catch up with David Crook, a developer evangelist at Microsoft. He is a co-organizer for the Fort Lauderdale Machine Learning User Group and is involved in many more user groups and meetups. You'll hear about some really cool projects where they are using Python and TensorFlow to work on simple things like growing more food to help feed the world.
00:00 This is Talk Python To Me, episode 73, recorded August 25th, 2016.
00:00 [music intro]
00:00 Welcome to Talk Python To Me, a weekly podcast on Python- the language, the libraries, the ecosystem and the personalities.
00:00 This is your host, Michael Kennedy, follow me on Twitter where I am at @mkennedy, keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via @talkpython.
00:00 This is episode is brought to you by Hired and Capital One. They both have great offers for Python developers so please check out what they are offering when you hear their spot, it helps support the show.
01:18 Michael: David, welcome to the show.
01:19 David: Hi Michael, it's good to be here.
01:20 Michael: Yeah. It's great to have you here. I'm super excited to talk about all the cool machine learning and Python stuff you guys have going on over there at Microsoft, and in your little area in particular. But before we get to all those stories, how did you get started in programming in Python?
01:34 David: So, Python in particular is an interesting one because I resisted it for a long time. Where did I start- my parents were like you are going to go to college and you are going to be a programmer, and I was like I am going to be an archaeologist, not a programmer. And, they kind of said, well, we are not going to pay for college if you don't do programming, so I said all right, I guess I'll do it. So I really got kicked off around college and I started video game club. I did a lot of XNA programming.
02:07 Michael: That's cool, so XNA just for people listening, XNA was like a high level game programming environment that you could write games for both Windows and Xbox in C# right, but it abstracted a lot of the game engine stuff, right?
02:20 David: Oh yeah, yeah and when you are just getting started into programming like you don't want to have to sit down and figure out how to do different slots with the GPU and all that kind of stuff, it was just really good way to get into programming, on your Xbox, and be able to have something that is useful and like a lot your own. So you know like you are doing a lot of game mods, you are kind of stuck with whatever you had in that environment, XNA was just wherever you want to do it gave you just high level abstraction on it, so that was a really cool start.
02:55 Michael: Awesome, so you had a group in college, doing that?
02:58 David: Yeah. I didn't really know how to code that much, so I discovered that if I surrounded myself with a bunch of people who could, then I could hopefully learn from them. So I formed a little group and basically recruited a bunch of people to teach me and the guys that I was teaching them.
03:17 Michael: How interesting. I think that's really common, really effective, I 100% agree with you that surrounding yourself with programmers is almost key to making you a programmer, you can do it just on your own, but it's really not the same experience. And I don't know very many other groups that are like that, like do lawyers say hey let's all get together after work and just talk about law, like passionately, you know? I mean, there is plumbers do not necessarily do that, right, no plumbers have a real craft and expertise they might have apprenticeships, but this like we are so excited we just want to get together and make everybody better, it seems fairly unique, to programming and I think that's one of the reasons it's so awesome, right?
04:00 David: Yeah, you know, I think there is a couple of trades maybe, where they do that kind of thing, like my uncle, he is in construction and does industrial air conditioning, and I guess they have like a builders' guild but it's not nearly like what you see in a tech system where you can just hop on and be like I want to go to some really obscure group in my area and bam, you found it, like holy cow, that exists?!
04:26 Michael: Yeah, absolutely. And you actually have a big hand in those existing in Florida which we'll get to. But, why don't you tell us your story about moving on towards Python, your path?
04:36 David: Yeah, so I started, it was a lot of C# development, schools like we are going to tech you this Java thing, I graduated, worked through Microsoft a while and then started discovering this machine learning IoT space, it was a really neat place to get into, and I keep seeing this Python thing pop up over and over and I was like oh it's another language, I already know like seven, I don't want to know another one, let's see how long |I can put it off for. So, I think I put learning Python off for about four years, until like recently I finally broke down and said, all right, that's the last library, it's the straw on camel's back I better just break down and learn this thing. And, so I started picking it up for about two or three months ago, just kind of slowly to just explore and see and once I kind of got in I was like you know, a lot of this is really familiar, it's not really that different from anything else, syntactically it's a little- a lot of the same stuff, but I get all these other libraries and capabilities along with it so it's been a really nice transition, and a lot was painful that I thought it would be.
05:51 Michael: Yeah, I think there is a lot of myths about Python, that are out there and once you get into it it's a little different than maybe what the preconceived notions about it might be.
05:59 David: You know, I think some of the myths were like you know, I was challenged with the couple of them, one was like the performance thing, like oh Python is not performant enough or whatever, and I was like well, at the end of the day, my performance is mostly like in web, like stateless servers or it's going to be from GPU compute, well it has great GPU libraries and then ok, I don't know if Flask has the most performance but it's a stateless web server so I can just always serve more servers at the problem, so I rather just have something that does everything and reduce the number of skills I need and throw servers at it, than fight with trying to learn something else.
06:40 Michael: Yeah, that's a good point. I mean, the performance story around Python, is not straightforward, I think it's pretty good in a lot of areas and there is certainly some areas where other languages like you were coming from .net for example, where it is better over there, but for example, I did a conference talk, that was trying to compare working with a no sql schemeless database and saying you should use a dynamic language or a statically typed language and I choose C# and Python as the two comparison languages, and the Python web app was actually quicker end to end from the browser perspective than C# so I think part of the story is the C# code surely ran faster because it's jit compiled, right, to machine instructions, but the framework was doing less so that it actually was overall faster, which is part of that like interesting tradeoff, like you wouldn't think that it would be but there is just so many factors that it's hard to say so yeah, that's cool. I think most of the time it's good, but you've got to just test it, right?
07:44 David: Yeah, and at that time it was just a shot in the dark, you know, you don't really know what it is doing under the hood anyways, so, you just cross your fingers and hope, even in the other worlds I end up like even in C# eventually, you are like ok, I need a west coast server and an east coast server, oh no, we've got a bunch of people coming from Texas, let's throw a central server in there, so you are doing it anyways, you might as well just pick what gets the job done the most with the least number of headaches, and then, be in Python because you get all the nice machine learning tools and you get web servers, and you get desktop, and the documentation is just absolutely phenomenal.
08:24 Michael: Yeah, it's a fantastic ecosystem, so before we move on to some sort of what you are up to nowadays, I did want to ask you since you are sort of fresh to Python, what was surprisingly easy or nice about the language, and what gave you challenges? You said a little bit, but like, give me the summary of your learning experience here? While it's so fresh in mind.
08:46 David: Yeah, so I've got a customer project that I am working on now, so this is like the first real, like we are going to do something production level with it, previously until about a couple of weeks ago, it was all just dipping my toes in, and now we are going full on production level. Some of the easiest things was Flask. I come from asp.net nvc with the API background, and like Flask is basically just like that with like a different syntax and I was like there was this phenomenally easy like the authentication, and token stuff, it was all very similar, the way that you do the headers and decorators, it was all very similar. There is some nice things like Python doesn't really seem to care what kind of object do you return, so from the same function I can return like some ambiguous object one and some other completely different ambiguous object, and that's really neat, I didn't think that was something that- in C# I don't know how you do that, usually you do a wrapper with like generic types and C# 6 made that easier, but-
09:55 Michael: Yeah, there is so much ceremony in the static languages, right, even the nice ones.
09:59 David: Yeah. So that was a really nice thing. Some of the challenges really is just kind of getting used to some of the tooling around Python so I spent like forever trying to figure out which editors I wanted to use and how packaging worked. It took me a while to just kind of just conceptualize packages because the way packages are in Python, they are very different, because it acts more like a script, it's not like a compiled language, so you don't have like different projects and libraries, you have packages which literally are just script files with initialization to it, and there is reduced ceremony in how that happens, so even with the reduced ceremony it's just not what I was used to, so coming to it from more of the statically typed and compiled language background it was like just a weird transition and it took a little bit of mental getting over.
10:55 Michael: Yeah, it's interesting, when you are learning stuff that- somewhat the challenges of learning things, programming techniques and so on, has to do with is the thing you are learning actually hard, and the other one just has to do with your preconceived notions of how things should be done. Right?
11:10 David: Yeah, I was trying to figure out how to stick extra libraries in, I was doing just all these extra abstractions that I just didn't need to do I was like ok, we are going to do some dependency injections, so I can swap out my back end, because right now I am pointed to some local files but I want to swap out with the actual database, right now they are on sql but they probably are going to go to Mongo. So I needed to figure out how to easily swap that out and just- I think it ended up just being easier to not use something like dependency injection because the abstraction in Python is almost, it was a quite as necessary as it was before, I guess it is what it seems to be boiling down to, I am still working through that problem specifically though.
11:54 Michael: Yeah, it's interesting if you look at the design patterns, that typically were created decorator, state, all these various design patterns, and you look at them for static languages, there is so much ceremony and formalization and then you look at them in the Python world and they are usually a couple of 12:11 oh that's all? It's nice but it's a mind shift for sure.
12:16 David: Yeah, there was one thing that I thought of that actually was more of a challenge, so I have a fairly hefty F# background, and I really like functional composition and pipeline, and there wasn't anything out of the box, especially with like data manipulation pipelines, I wasn't finding very much out of the box, for that in Python so there is the pandas package but it wasn't really set up so that I could kind of do functional composition easily, but using some of the decorators and kind of making up this infix thing, I am not really sure how exactly it works, I got it to somehow work, but, it gives me functional composition, it's just instead of being out of the box I had to kind of write and I had to fight with it a little bit to make it happen, and there is some magic behind the scenes that I don't fully understand but somehow I got it to work. Whereas like, in like R has the deep 13:13 in 13:14 packages and F# just has it out of the box. Python little bit of a fight but it works and now I get what I wanted.
13:20 Michael: Yeah, ok, really nice. Really nice. You work at Microsoft, doing some interesting stuff and you said you kind of live inside the company but also outside in a little bit of a bubble. So maybe tell us what do you do there?
13:33 David: Yeah, so I am a developer evangelist at Microsoft, and the easiest way to think of what a developer evangelist is and does is that we spend about 50% of our time focusing on creating a healthy technology ecosystem in our area. Typically focused around what we like to do, so I do IoT and machine learning so you'll see there is a very healthy IoT machine learning ecosystem in south Florida. And then the other 50% of the time is spent working on customer engagement, so finding some really cool stuff that folks are working on in the area and just help them be successful with it, and it's interesting because we don't charge any money, we are just focused on ensuring that people are successful with our technology and getting inside from that and delivering it back from to the product group. So it's very much a field position, and you know, that said, I don't live in Redmond, I live in south Florida and there is a very different kind of view because I am not like at the Microsoft office 24/7 I am spending more time with people who are like outside of that world, you know, literally I think we are about three thousand miles away or something like that, you have to go diagonally across the entire United States-
14:52 Michael: That is about the maximum distance you can get in the continental United States I think.
14:55 David: Yeah, that was kind of a neat thing for me, I was as far away from HQ as I can get without leaving the country, so it's a very interesting perspective because you get to see a lot of the inner workings of Microsoft without necessarily be a part of those inner workings 24/7, so it gives you a very different perspective on kind of what like outside of Microsoft is doing and you can feed that back into Microsoft so that we can better build tools and technologies that are more aligned with web folks are doing. So if you see like hey this technology whatever it is doesn't seem to work or meet exactly the expectations because really they are doing this, it gives you the perspective to provide that insight back.
15:42 Michael: Yeah, that is cool, and we can definitely talk more about the shift in perspective at Microsoft around these types of things in a little bit, but I think it's the short version for now is it's an interesting change and those things are way more possible than they used to be, right?
15:59 David: I am calling it the Satya decade, or era of Satya, it's very different, I get to have an android going.
16:08 Michael: Exactly, it's so Satya Nadella, the new CEO has made a lot of changes on how that works, right?
16:13 David: Yeah, it's very interesting, it's like, he keeps using the term "customer obsessed" like he wants us to be obsessed with our customers and just focus on the customer 110% no matter what it is, and just make sure that they can be successful with what they want to do, using what they want to use on the Microsoft platform, and it's a very different mentality shift than what used to be in place.
16:42 Michael: Yeah, absolutely. So you said you work with IoT a little bit, what do you guys have going on there?
16:49 David: Yes, so this has been a really big space for us lately, we have this whole IoT suite which is, there is primarily cloud hosted services but we have some on premise offerings, but it's all about being able to pump as much data as you possible can into the cloud and do meaningful things with it; so there is a variety of different projects that I've seen where we do this, everything from collecting telemetry from pipelines to creating intelligent teddy bears, to there is a water flow analyses project and intelligent skateboards, just all sorts of cool stuff.
17:30 Michael: Yeah, that sounds really awesome. And then, what does machine learning have to do with it?
17:34 David: A lot to that is around delivering value back through the data that you collect through IoT devices, at least my focus area is, so if you are collecting telemetry you have maybe PH and humidity temperature and you might have vibration, it's ok, you've collected all of this information, you are streaming it all up into the cloud, but what do you do with it, and machine learning will help you recognize various patterns and attributes to that sensory data so that while the data is actually in motion, in the real time data pipeline, you can actually execute machine learning algorithms against that in motion data, and surface interesting things to be aware of. So for example if you see a pattern that looks kind of like hey there is about to be a leak in your pipeline, you can actually recognize that pattern in the data pipeline, and then put a message on a data queue for like worker role to pick up and send SMS text messages to the engineers and the executives saying, dude your pipe is about to explode, you need to go take care of this now.
18:48 Michael: Yeah, that's really cool, so that's really different than, oh, this is just outside the parameters, but something instead like the last four times we saw the pipe explode, you know, it kind of did this in the machine set, you know, that looks a lot like something I've seen before, and you didn't like it last time, go check it out.
18:48 [music]
18:48 This portion of Talk Python To Me is brought to you by Hired. Hired is the platform for top Python developer jobs. Create your profile and instantly get access to 3500 companies who will work to compete with you.
18:48 Take it from one of Hired users who recently got a job and said, "I had my first offer on Thursday, after going live on Monday and I ended up getting eight offers in total. I've worked with recruiters int he past but they've always been pretty hit and miss, I've tried LinkedIn but I found Hired to be the best. I really like knowing the salary upfront, privacy was also a huge seller for me."
18:48 Sounds awesome, doesn't it? Wait until you hear about the signing bonus- everyone who accepts a job from Hired gets a $1000 signing bonus, and as Talk Python listeners it gets way sweeter. Use the link hired.com/talkpythontome and hired will double the signing bonus to $2000. Opportunity is knocking, visit hired.com/talkpythontome and answer the door.
18:48 [music]
20:17 Michael: That's cool, I've thought about that for software, right, like Netflix actually uses machine learning to monitor their servers, and their apps and production, but that seems to me like really, clear cut, like what it is you are processing but for pipes and all sorts of physical things that's really cool.
20:36 David: Yeah, I really wanted to spend some time like getting out into the physical world a bit, so that I could see it, like I really wanted to like- everything that we do as software engineers is virtual all the time, and I wanted to see is there a way to create a stronger relationship between the physical and the virtual, and kind of pass information between those two worlds in a meaningful way that delivers value to our customers, and IoT and machine learning together really is a powerful way to do that and it' neat, especially with like a lot of the stuff that is coming out like, I don't know if anyone watches GPU conference, but Nvidia just came out with this robocar dev car, literally it's like a formula one race car that is like a self driving car platform for like formula one race cars.
21:27 Michael: Yeah, I just did hear that they just came out with the CPU- GPU type thing for cars, that's really cool.
21:34 David: Yeah. Seeing that kind of technology come out is absolutely amazing, so that will be an example of something that is really powerful that you can stick on any number of sophisticated devices and you will collect a lot of the information video feeds from that device that it's computing stream that up to the cloud, so you can optimize your algorithms, your self-driving car algorithms, your image recognition what have you, and then basically you get back a matrix of numbers, huge matrix using numbers and you just shove that back on to the device, and it's a really neat pipeline because it's just so cool to see that happening.
22:18 Michael: Yeah, it is really cool to see those dedicated things, I mean, GPUs used to be just for graphics, right, but now they are for so much more, you said you are doing some stuff with GPU wrappers, but it's been a little while since I've talked about this whole programming logic on GPUs and why you might care, but maybe give people like a quick summery, like why would people program GPUs over just like regular code like what's the performance story there?
22:44 David: I am taking a look at the Microsoft coco's database as an example, so you can go look at this, I think it's mscoco.com if you just like bing Microsoft coco's database, you will find it, you can google it too I am sure you'll find it both ways. Anyways, it's image recognition database that you can use and there is a challenge out there. Anyways, what's interesting about images, especially if you are talking like 1080 by 1920 literally to get the number of features or colons that you are going to need to compute against you multiply it, 1920 time 1080 that's the number of pixels, and if you have an rgb image you have three channels and now multiply that number by 3, your 23:27 in the 5 million representations of an image, so each one of those pixels is potentially important-
23:36 Michael: Right, how many frames a second, right?
23:38 David: Yeah, so if you are doing 60 frames a second, and you have one second, you just generate it 60 times whatever that giant freaking number was, and that's one second worth of information. Now out of that, you will probably extract a few key images and compare against that, but you know, at the end of the day you are talking huge quantities of compute data, and when you think about a GPU versus a CPU, you are doing a single mathematical computation and on a CPU you have one, so you do one computation, maybe you have six CPUs and like 12 virtual processors, and maybe you can do 12. While on a GPU, I just built my own dev box, I've got the new 24:24 each one of those has 3,100 processors, so on my dev box that I have, I have about 6,000 processors so with each one of those I can actually do a numerical computation against each pixel, literally six thousand times more and faster than a CPU because I have 6000 processors instead of one.
24:51 Michael: Yeah, that's insane when you think about the parallelism differences there, right?
24:55 David: Yeah, so when you are thinking GPU versus CPU it's how are you parallelizing your workload and what is it specific for, and the other things are like a GPU has its own internal memory, so each one of my GPUs has 12GB of internal memory, so I can load, because I've got two, I can load 24GB of image data onto those, and I have them setup with the standard SLI Bridge so it's basically like a super awesome video game box, one of the benefits of deep learning is you can play awesome video games on your work station. You load the data on and you transfer the model back and forth over the SLI bridge, and it allows you to compute so much faster because you have the onboard GPU memory which is typically GDDR5X, I don't even know. One with the 5 and an x in it, as opposed to DDR4 which is just a much faster memory so it's more performant way to do large quantities of data that have excessive numbers and features whereas like you might use something like spark if you have large quantities of data but maybe not as much parallelism in each computation task, like in an image, you have 1920 times 1080 times 3 number of computations per observation, whereas like in business data, you might have upwards of 200 computations, maybe a 1000 computations per observation.
26:33 Michael: Right, that makes sense. So, how do you program these GPUs, can you do that from Python?
26:39 David: You can, so there is two libraries that I was evaluating, there is one that is made by Microsoft and there is one that is made by Google, both are open sourced, the Microsoft one is CNTK or computational network toolkit, I don't know if the Python wrappers are available yet, they should be some time, you have to check their git repo, I ended up not using that one, I went the Google route, I work for Microsoft and I get to use Google compute graphs, it's awesome, welcome to the Satya era, and so I am using TensorFlow, it works wonderfully on Azzure, it works wonderfully on my box so I get to use Google technology on the Microsoft platform. So that's kind of how I get to use it with work, but the neat thing about tensor flow is that you get to use Python and it has interactive and batch execution, so if you are just getting started into it and you are like I don't know how this thing works, well, open a console and start like having it in interactive mode and outputting some stuff and just see what the heck happens, and poke around with it, whereas like a lot of the other compute frameworks require your back to almost compilations style where you generate your full graph then you more or less compile it and then execute it and then it shoots back really cryptic errors when something goes wrong.
28:03 Michael: I see, so like you would write it in let's just say Python and then compile it to like shader language or something like that you send over the GPU and then get the answer back?
28:12 David: Yeah, that's kind of how most of them work, and TensorFlow has that capability as well, the benefit to doing it that way is that you actually take ful advantage of everything that it does behind the scenes for you , whereas when you run TensorFlow in interactive mode, you are just going to trash your performance to you might as well be doing it on a CPU. But the benefit is that you can experiment and learn in interactive mode and then just swap your flag back to batch mode and then you get all the performance goodness.
28:42 Michael: Ok, that is really awesome, yeah, so if you are doing machine learning or just real computational stuff, right, TensorFlow sounds great.
28:50 David: Yeah. And, it supports multi GPU, multi node, just really cool stuff, and I am pretty excited, it's really expensive to buy GPUs but now I get these rentable GPUs so Azzure just ran into private preview with n series and AWS has some virtualized ones, and you can get as many nodes as you want, learn TensorFlow, and if you can't do it on your own box, just rent some GPUs, they are usually like 2 bucks an hour, so for a GPU box on the cloud. I can't remember if it's 2 bucks a minute or 2 bucks an hour.
29:20 Michael: I think it's 2 bucks an hour. That's pretty awesome.
29:27 David: Yeah, it really is helping democratize kind of scientific hyperscale computing, the cloud is just an absolutely phenomenal thing, like before, like my workstation I am lucky to had a lot success in my career and my life, so I am combining like a $5,000 workstation for work. That's not necessarily possible for everybody, but in the cloud you go grab a GPU box for a few hours and have a script that turns it on and turns it off for you.
29:56 Michael: Yeah, that's awesome, even if you work at a company where they would buy you a $5, 000 GPU box, if you have no idea, if it's actually going to solve your problem, spending $10 to find out and then buying the $5,000 machine, is really a nice possibility rather than leaping in and seeing if it is going to work, and maybe it won't.
30:15 David: Yeah, and yeah, I think there is other benefits to having the cloud GPUs as well, mostly that they are even faster than mine, so like I don't- I am not terribly familiar with AWS's offering what they have, but we've got you get four of the latest ones that Nvidia has and that's the one that's two dollars an hour and you get direct access like ok, so it's five grand for me to get like a box with two GPUs and they are not even the top of the line, whereas I can go and get the capital series and I get four of them and all the back end cardware, at $2 an hour and it's just my mind is getting blown with what you can get and what price you can get it for.
30:55 Michael: We definitely live in a special time, don't we?
30:57 David: Yeah, I am feeling a little spoiled about that.
31:01 Michael: So you talked about your career and you said maybe it's a good time to touch on that. You said that you had a bit of a rough start in college and the contrast between college career and professional programmer career are really different; you had some great tips there.
31:16 David: Yeah, so I was pretty much a typical college student when in I was like I don't want to do programming and parents were like you are going to do programming and I was like ok, well I am going to be rebellious while I do programming, so I was pretty much throwing parties all the time and doing all that kind of stuff and was like I had a super low GPA like it was my senior year my GPA was like 2.4 and you needed a 2.6 to graduate and I was like crap, I've got three years behind me with a 2.4 and I need a minimum 2.6 so I can get out of here.
31:52 Michael: That is a little stressed.
31:54 David: Yeah, and man, I wish I had focused and took advantage of the opportunity and seeing it for what it was, when I had it, but that's not what I did so you know, I had to deal with that situation, so I started a bunch of groups, like you know the game dev groups, study groups, programming groups, I had about three different groups that I was managing ad running to just hopefully surround myself with enough experts that I could pass, and I made straight As the final year, I was able to get I think like a 2.7 GPA which was enough to graduate college with my major, and it created a really difficult hiring experience but luckily like I did the interview for Microsoft they didn't ask me what my GPA was, they wanted to ask what I did and what I liked doing and luckily that last year college I was like hacking on like I had a motorcycle with like a performance module that you could like hack into and make your motorcycle go faster, so I hacked into that thing and like did all sorts of craziness to my motorcycle and made much video games, and it was like here is what I got and Microsoft was more interested in what I could do and not what my GPA was so I got super lucky that they never asked that question. And I was able to get a job.
33:10 Michael: That's really cool. You know, I mean, I've been in hiring roles at companies I've been at, and we almost didn't care what degree you had, if you had a PhD, it gave you a little more credibility than a college degree, which gave you a little bit more than no degree, but it really was what can you do what have you done, do you have proof of that. And, I think that's also pretty neat about programming.
33:34 David: Yeah, and actually Microsoft was the first company interview where it was like that, the issue I had was getting just into the first interview, you know, you submit your resume and they have like these processors that go and say 33:47 so you'd never even got the phone call, I think I actually was a developer evangelist, you showed up at my school, I just talked their ear off and gave them my resume, I was like just read it, I don't care, just do something with it, make sure you call me, I don't care. And, that happened. There was a lot of the ones where I was applying online I didn't make it pass that automatic filter, you know, there is a lot of like- you know, you get however many students graduating out of college, you get to process it through somehow autonomously so I think that's what they did.
34:24 Michael: Yeah, that did definitely make sense. Very cool. And since you got to Microsoft, you said your career is really accelerated, you had some process to make that happen- do you want to share it?
34:34 David: Yeah, so I think the big thing was I had this kind of oh shit moment in college and now I am out, I actually got married and it was like now I've got a family and failure was not really an option at that point. I had to come up with the method to kind of ensure I was successful and never encountered like that on the edge of failure scenario ever again, that's like the scariest place to be, and so a lot of it went down to focusing and working really hard on specific areas, and building a community around those areas that I was actively engaged in. Which is kind of abstract but when I first joined Microsoft I came in as a consultant, not doing development operations and I didn't know a first thing about it so I went to a local user group which was a .net user group and started talking about TFS and application my cycle management, and source control, and best practices, and built a group up, we got to few hundred there and it was once a month, sometimes twice a month, meeting with other professionals in the area to really zero in on this one specific thing.
35:57 The main thing I learned was it like if you are to be successful, a lot of people can do different things very poorly, but very few can do one thing extremely, extremely well, so if you want to be the one who gets the phone call, you want to be the best of the best of the best. Of course, you are going to pass up opportunities but you are going to be the one who gets the most opportunities in that specific area and if you look at programming, there is so much opportunity out there that is better to be the one that everybody thinks of- oh I need to do like this type of server, let's call David, so once I kind of understood that focus on that very specific thing and be the best in like North Carolina at it, that's when it kind of clicked and I started seeing a lot of success in my career because servers would go down, as they always do, and they have to call somebody and it could be a critical scenario and so who do they call, do they call the guy who is like a web developer devops tfs open source big data guy, or do they call the person who is the best in class at that one type of server to actually solve significant increase in my billable hours being utilized once I focused, and a lot of that was enabled by focusing on the community and trying to see success with that community and growing it into myself.
37:24 Michael: That's really awesome, I totally agree with you that specialization is key, I mean you need to have enough broad skill that you can get along with general situations but there needs to be a thing that is your specialty. If you pull that off you can really, like you said, you get the invitations for the conference talk, you get the call when people need help when people google something they find you, it just, it unlocks opportunity and that doesn't need to be a broad specialty, it could be something fairly niche.
37:53 David: Yeah. Like if you are the best dude at charting or the best girl at charting, like who are they going to call for charting, how many charts do people need to draw, like millions of charts, there is charts being drawn all the time, so if you are just the best charter in the world, you'll do very successfully. And I think you owned in on something that was really interesting, I call it the integration points, so it became more apparent that I had to learn my integration points as I started moving over to data and machine learning was that I could do machine learning and be the best at machine learning unless I knew how to surface those capabilities into a web server or onto a physical device, I was still kind of useless, I needed to know like my specialty and like three or four ways to expose my specialty to other people that was useful, so that the microservices phenomena has become a really impactful thing because now it's like machine learning Flask wrap it in API ship it up to the cloud anybody can use it. That's integration point 1.
39:00 Michael: Yeah, absolutely. That's really good point, I had John Sonmez on episode 71, talking about specialization and niches and he often talks about T shaped knowledge, you got some broad base like you can set up a web server, you can create a basic web page, you can talk to database, but then you've got some vertical deep slice that is like the reason people call you, right, and I think that's a great way to think of careers.
39:23 David: Yeah, that was actually when I came in to Microsoft consulting services that was the first 39:29 here is what the T model is here is what your specialization is, here is everything that surrounds that specialization know it or die. I was like, ok.
39:40 Michael: Ok, I am listening.
39:40 [music]
39:40 CapitalOne has a special message for you- they need Python pros who love to work with data; put your Python experience and work at CapitalOne and help them use data to make life better for millions of customers. CapitalOne is employing the latest tools and approaches to do data analytics and data science from the ground up. There is smart, creative professionals love to explore new ways to interact with data, they are interested in figuring out novel, advanced Python techniques and even more interested in finding more people who will help them do that. When you join their state of the art Python community you will work with people you really like, people who might be listening this podcast right now. Relentless innovation is their way of life, make it yours at CapitalOne. Visit jobs.capitalone.com/talkpython to learn more and apply today.
39:40 [music]
40:42 Michael: Awesome. One of the things you said that really helped you get into that position was surrounding yourself with people. And we talked briefly about some of the usergroups, that you've been part of, you've got a long history of creating and setting up and facilitating user groups, and you have some going now, let's talk about that a little bit. Give us the history first.
41:02 David: Yeah I started back in college like I said, I started a video game group and then when I got into Microsoft creating other community groups I've always been a field role at Microsoft so I have never really been able to surround myself with other Microsoft people so I had to come up with more creative ways to gain that knowledge and that expertise, and you know, all the different big companies have things like NVPs which most valued professionals or some sort of like honorary role for you are not part of the company but you are super awesome that's why I give you a lot of special privileges-
41:35 Michael: Right, MongoDB has MongoMasters, Google has their equivalent, you guys have your NVPs, and so on, right.
41:42 David: Yeah, so you find those guys and you really just surround yourself with them. And then you invite other people in and you create a very open ecosystem, so I started joining a very successful one in North Carolina, the Trinug, the triangle .net user group, and the research triangle part data analyst user group, two very large groups in North Carolina, and really got me going and understanding how a professional group is run and operated, they've got memberships and access of like 1500 per group now, of some of the best scientists in research triangle park. So kind of after my ten year there for a couple of years learning how that is, I transitioned to this developer evangelism role where now one of my primary jobs was to lead and facilitate grow groups of that nature. So then I moved down to Florida, and there is some really strong groups in Florida already, but maybe not specifically in the areas that I am going, at this point I have already transitioned to lot of data science and I was doing a lot of F# and stuff and now I am trying to do Python and TensorFlow. There is like three or five groups here that I am really focusing on, there is the Fort Lauderdale machine learning user group, Miami data scientist, there is FLA.net, Gold Coast user group, code for Fort Lauderdale I am actually speaking there tonight, code from Miami, just a variety of groups, and what's interesting is that you can each group has its own different specialization, so like Fort Lauderdale machine learning is all basically Python TensorFlow machine learning and they have a fair amount of R in there as well, fla.net is all like .net and like nojs Miami is all Javascript.
43:40 Michael: Yeah, excellent. And you said that you had a pretty interesting format or style for your Fort Lauderdale machine learning user group.
43:46 David: Yeah, we started, we just experimented with this couple of weeks ago, I've done a couple of these before at other user groups and seen some success, but I call it an open forum and basically it's a semi structured user group, a lot of what I see at other user groups is very presentation lecture oriented, you go in, somebody talks at you for a while and then you network for 20 minutes or so and then you leave. But this is a very different take you know, everything that I've seen for how I get success where I see other people get successful is that they have hands on experience, so we've created a semi structured event that happens every other week, where we have a three hour block where everybody shows up and we talk about high level what's the direction that you've seen over the past two weeks, anything interesting you want to talk about, we did about 20, 30 minutes of that, and then we spend about 2,5 hours, or 2 hours actually hands on solving problems, so right now, we've broken out into groups to try to win Kaggle or Microsoft Coco's image recognition challenge, and what's neat is that you see a natural tendency where people share their knowledge and form into groups and are actively pursuing knowledge and actively using the tools and solving problems.
45:21 So, you can go into the presentation, how to do data pipelining or how to ensure that your GPU kernel is always awake. Or, you can say here is a giant database of images with labels for the next three hours over the next three months, we are actually going to get an accuracy of 90%. And what was neat was that nobody has to even know how to do it, so like, when you have a lecture style group you know, you are forced into this thing where you have to have an expert, if you do an open form nobody has to be an expert because everybody is learning, nobody even- you have to have a general idea where you want to go, but you don't actually need an expert because you are actively coding and you are going to surface these issues and you are going to naturally solve that. That said we do have experts around, because people do this for a living, so that's really beneficial and you see like we had a couple of docker experts we had some Python experts and some TensorFlow experts, and you know, some people knew TensorFlow and Python but didn't know how to us the GPUs and the cloud, so Docker people come over like I can help you with that, can you show me a little bit about this TensorFlow stuff that you are doing because the Docker experts don't know about TensorFlow. So you get this really cool cross collaboration of skill sets.
46:47 Michael: Yeah, that's really and for those who are not familiar with Kaggle, Kaggle sets up periodic data science competitions, and they are probably the best known out there for that, and so I think it's fantastic that you guys are like the goal of this user group is to win Kaggle. Because, when people come back and they try to get a job or a raise or whatever, they say well, you know, what experience do you have? Well, I place this level in Kaggle and here is how we solved it and it's super concrete rather than I sat through four talks on data science I am really ready.
47:19 David: Yeah, and you know, like because, when you do interviews, it's like we were talking at the begging, what have you done what can you do for me, where is the proof? Well, I am second place on Kaggle kind of speaks for itself, right, like here you go, how much money do you want?
47:36 Michael: Exactly, that's when people start calling you rather than when you call people, and of course, just like being a specialist having people call you it outs you in a much better position to get the job.
47:46 David: Yeah, and you know, the other neat thing to mention about this particular user group is we actually partner with a couple of local recruiting firms, so what is neat is that recruiting firms have jobs available and they will send the recruiters in, they actually provide food they give those prizes, so like last time they ordered a bunch of like really fancy pizza and not like the typical like nasty stuff, but like really good pizza and they had Amazon gift cards, and we had about 50 people show up, and at the end, they said ok, by the way we've got three jobs in Python, we've got two jobs in Java, couple of .net and like one data science position, come up and let us know if you are interested in any of these jobs and not only are you growing your skills, because we've placed these events in partnership with recruiting firms you get to learn your skills apply your skills, and then hopefully go get a job with these same exacts kills. And the recruiters know who you are, what you are doing, if you are showing up and they will vouch for you.
48:53 Michael: Yeah. It's really an excellent combination there. So with all this community building, and stuff, you had some funny stories about the challenges of building community; one of them involved office hours, do you want to tell us that?
49:06 David: Yeah, so office hours I run office hours in downtown Miami every week on Fridays and you can get some really incredible people and things to show up at your door, and then conversely, you can get some really crazy people to show up too. So we've done a lot of stuff about like the goodness, so we can mention a few of the crazy ones, I am not going to mention any names or events or anything like that, but you know, when I post my office hours I post them up on meetup and it says machine learning office hours Microsoft developer evangelist will be present free no charge kind of thing, so one day this individual shows up just like I don't know, just to harass me about stuff, like Google I guess, this individual was I really like Google but like Microsoft kind of sucks and I'd really want to get a job with Google can you like get me a job with Google, and I was like, look, I work for Microsoft, I can't get you a job at Google, if you want to get a job I can give you some directions, she is like, great, but I don't want to like code or anything, can you like do this- I was like, oh my god, and I was sitting there in my head like staring at her, wondering how long I can let her talk, until just she disappears, and it must have been two hours, and it was an interesting experiment to see how long I could just like, sit there.
50:37 Yeah, nobody else showed up that day, so it wasn't like I had anything better to do but it was just an example of some of the craziness that you'll get and you'll get all sorts of different people showing up that are like, that we've had people come in and like start throwing things, but there is not too many of those, but when it happens, you are like, you can tell as soon as that person walks in the door, there is like I don't know, you can just like smell it or something, at this point and you just like clap your hands and rub them together and you are like all right, let's get this going. Yeah, it's an interesting experience, but then you have other things like, I mean like so many amazing innovations that have come through the door, just blowing your mind, there is this one individual who came in and he lost his job and he is an older fellow, I am not sure, he is probably in his sixties, but anyways, he lost his job and decided whatever, I am not going to let that get me down so I am going to start a company.
51:42 So, he starts this company around like executive financial planning, and just total ninja style, builds this incredible platform for every wild thing that you could ever imagine that you would want to do as an executive planning financials for your global organization, like he came from a really large company so he kind of knew how like money moved around and what you had to do because he was the programmer for those tools and he basically rebuilt this huge platform and you could just attach anything to it and it would do planning like I will give you some insight, some of the planning that it would do would be things like I have my primary workforce is in the United States, but my primary sales are in China, I need to pay my workforce but every quarter it seems like the Chinese currency keeps dropping which decreases the value of my stock but this much stuff I transfer my money in via these three ways this is how much money I lose. Then, I am going to get impacted with the bringing my money into the United States tax, well here is a more creative solution around that that reduces the impact on your ability to pay the US workers, ok well now we've generated some debt in the US, we still have the money in China because we didn't move it over that intelligently, how do we transfer the debt over to China via some back methods so that I can then pay off my debt in China as opposed to paying it off in United States, and it has this whole like predictive aspect of like everything you could ever imagine from like a perspective that just blows your mind.
53:31 Michael: How interesting. Yeah so that guy walks in and says look, I've built this help me. Very cool.
53:37 David: Yeah he walked down and was like, what can I do, I built this, I don't know what to do with it now, and I was like you know what, let me just shop this around, see what's going on, we might need to make it look a little prettier, you know, I was much more focused on the abilities as opposed to the prettiness, but it looks like it's going to go somewhere, it's just something of that magnitude is just absolutely phenomenal.
54:02 Michael: Yeah, that's really cool, to be able to help people like that. So, David, we are getting kind of short on time, there is a few things I do want to still touch on, so let me ask you about the state of Python and machine learning and stuff in Azzure, because that's something you guys have actually been pushing pretty seriously lately.
54:18 David: You know, with Satya and kind of a lot of the folks in my position we have other evangelists, a lot of us had started transitioning to Python, there is more than just me out there doing this and we are seeing that Python currently is the number five language in the world, and if you look at the top three are like C and C++ well those are basically the same, and then there is Java and C# so then you've got Python after that, so we're seeing a huge need for enabling Python on our platform so actually I think every single one of our platform as a service offerings on Azzure are in fact Python enabled so my current workload is Python and Azzure. So I am deploying web applications to Azzure with Flask and there is things like Azzure functions which is analogous to AWS's lambda and that's all Python enabled. Everything has a Python API and it seems to be working really well, and there is a lot of interesting things that you can do as well because everything has a Python API so if you want to deploy a series of servers and get the state of your servers and do really interesting distributed compute or even stateful stateless, it's a crazy concept, there is this thing called service fabric, it's a stateful stateless service that helps you do distributed computation in a stateless like distributed fabric kind of fashion.
55:55 Michael: I see, so like as a whole, it's stateful but any individual piece is stateless, something like that?
56:01 David: Exactly, and it distributes your compute across n number of nodes, and it's super cutting edge like nobody else has anything like this right now, and it is first thing that came out was C# nest up is Python, it's like that's really cool to see like the priority seems to be, and you have to balance this with inside outside, right, like I don't actually get to see what's on the back log, I just get to look at what is coming out in what order the tools are coming out, but if you look at Azzure it seems to be C# and then either Javascript or Python, it's going to be kind of in that order. And they are going back and forth between well do we support Javascript force or do we support Python first because some of the tools will come out with Javascript or Node first and then others will come out with Python first and it's very interesting perspective to see that those are key priorities. And the cognitive services. I forgot about the cognitive services, just there is a whole host of APIs out there on Azzure that are focused on data science that are basically prepackaged solutions they are all Python enabled and like Jupyter notebooks as a service and just tons of Python stuff.
57:23 Michael: That's really awesome. Yeah, the whole framework that you guys are coming out with also looks pretty interesting as well.
57:31 David: Oh yeah, I can't wait to kind of get my hand on that, like there is a big focus on natural language processing and 57:37 of that so like you'll see that, especially if you are using Python you get a lot of like ad hoc power that you normally get because you get things like Scikit learn and then you get to put that in things like Azzure machine learning which is our democratized drag and drop machine learning toolkit but if you are good with like Python Scikit learn, you use the bot framework and we have something called LUIS, the language understanding intelligent services, you can make combinations of this pre-packaged solutions, and then throw some semi custom to fully custom stuff into the Azzure machine learning as like a microservice, and the powerful offerings that you can develop as a consumer of Microsoft services so like you would consume the Microsoft services to create value to a customer but you can crate that value super fast like I had to do a pre sales engagement where I had to develop a slack bot that could respond to random queries about a specific customers product offerings, I had 48 hours to build this thing from scratch and it also had to handle speech to text and text to speech, out the door in less than 48 hours because of the ability to take advantage of those pre packaged solutions and through a little Scikit learn and there were needed.
59:08 Michael: Oh that's awesome. Yeah, very cool. So I think we have time for one more thing that I wanted to touch on, and you said you are working on on a project with autonomous farming- can you talk about that?
59:21 David: Yeah. So I am not allowed to talk about it too much, because it's just now going under development, but the details that I am allowed to share is another one of those tight timelines you are working with folks who aren't necessarily experts in IoT and machine learning and you have very high value crops in like huge quantities and there will be things like the state of the environment will change throughout the night and you need to autonomously deal with that because you might have one farmer or one farmhand for every hundred acres or so, and if something terrible happens like the Ph levels go completely out of balance, will your farmer like be a hundred acres away and now he has to ride his bicycle his golf car all the way over there, and by that point if it's a super young plant you might be in trouble. So having the ability to create what I call command and control systems you were able to kind of extract information out of the sensors that you have, figure out what the state of that system is, and then send messages not just down via text message to the farmer, but down to like in our drainer like device, and actually motor controls bowls etc, and it can actually impact and resolve the system and put it back into a stable state autonomously. And farming is one area that we're seeing as a very interesting place and again a lot of that gets enabled you know the tight timelines, and data consolidation, they have data from all over the place, you can just do it all up into cloud based storage in Azzure build your models and then surface the actual generated algorithms either on device, edge devices or in the cloud depending on what you need to do.
01:01:31 Michael: That's really cool. That sounds like a fun project.
01:01:34 David: It's probably one of my favorites, there is if we touch bat base again probably around March, there will be a lot more details that I can share on probably about three or four projects that we have going on of similar just neatness.
01:01:51 Michael: Nice. Awesome, it sounds very exciting. David, I think we have to leave it there. Let me hit you with a couple of questions before I let you go of course. There is over 80 thousand packages on PyPi you can grab and create autonomous farms from, or various other things right, what ones are your favorite or you found really useful and you want to tell people about?
01:02:13 David: Hit install TensorFlow. I mean, for some of the large scale stuff that I am doing and some of the neat things that you can do with it just you can pip install TensorFlow so I would say that's it.
01:02:26 Michael: Oh, that's awesome, yeah, very cool. And editor- what one do you use?
01:02:29 David: I actually have to split it, there is two editors that I use, I use PyCharm for a lot of the Flask development and like with all call production defined my code, and then I actually use Continuum Analytics Spider 3 or Spider 2 the command is Spider 3 because I find that for doing interactive visualizations and experimentation, I write a lot of my code in Spider, and figure out my algorithms and then I copy paste it into PyCharm.
01:03:02 Michael: Oh that's awesome. Yeah, I can definitely second both of those, that's great. All right, you've got one final call to action for everybody?
01:03:08 David: I'd say check out the Python capabilities on Azzure, like the power that the cloud can bring to the table at the cost that it brings is just as mind boggling, so that is just definitely something to take a look at.
01:03:24 Michael: Oh yeah. Very cool. All right, it's been a great conversation David, thanks for being on the show.
01:03:28 David: Thank you Michael.
01:03:30 Michael: Bye.
01:03:30 This has been another episode of Talk Python To Me.
01:03:30 Today's guest has been David Crook, and this episode has been sponsored by Hired and CapitalOne. Thank you both for supporting the show!
01:03:30 Hired wants to help you find your next big thing. Visit hired.com/talkpythontome to get five or more offers with salary inequity presented right upfront, and a special listener signing bonus of $2,000.
01:03:30 Are you a data scientist or Python developer who loves data? If you are looking for a place to work on data science with truly big data, that can affect millions of lives then head on over to jobs.capitalone.com/talkpython and check out the wide range of jobs that CapitalOne is trying to fill right now.
01:03:30 Are you or a colleague trying to learn Python? Have you tried books and videos that left you bored by just covering topics point-by-point? Check out my online course Python Jumpstart by Building 10 Apps at talkpython.fm/course to experience a more engaging way to learn Python. If you're looking for something a little more advanced, try my Write Pythonic Code course at talkpython.fm/pythonic.
01:03:30 You can find the links from the show at talkpython.fm/episodes/show/73
01:03:30 Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, Google Play feed at /play and direct RSS feed at /rss on talkpython.fm.
01:03:30 Our theme music is Developers Developers Developers by Cory Smith, who goes by Smixx. Cory just recently started selling his music on iTunes so I recommend you check it out at talkpython.fm/music. You can browse his music there and listen to the full-length version of Developers Developers Developers.
01:03:30 This is your host, Michael Kennedy. Thanks for listening!
01:03:30 Smixx, take us out of here.