#135: Capturing human moments with AI and Python Transcript
00:00 Michael Kennedy: We all have smartphones these days, and we take them with us everywhere we go. How much could you infer about a person, their stage in life, their driving style, their work-life balance, based on just a phone's motion and GPS data? With the right mix of analytics and machine learning, it turns out you can learn a lot about a person. Are they a dog-owning workaholic or an early-rising parent of young children? This week you'll meet Vincent Spruyt, who is the Chief Data Scientist at Sentiance, a company building an SDK to answer these exact questions. You'll learn how they are using Python to make this happen and how they think this data could be used for the greater good. This is Talk Python to Me, Episode 135, recorded October 25, 2017. Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter, where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via @talkpython. This episode is brought to you by Datadog and GoCD. Please check out what they're offering during their segments. It really helps support the show. Vincent, welcome to Talk Python.
01:24 Vincent Spruyt: Thank you.
01:25 Michael Kennedy: It's great to have you here. You guys have a super cool platform. You're doing some seriously deep learning, and A.I. and machine learning, and I think everyone's going to get a pretty cool look at what you guys are doing and how you're doing it. There's a bunch of cool algorithms going on here. But before we get into all that detail, let's talk about how you got into programming, Python, things like that.
01:45 Vincent Spruyt: Yeah, sure, cool. It started when I was about 14, and I started hacking around with some web design. Remember those days where everyone used those marquee banners? There was no CSS and stuff like that.
01:56 Michael Kennedy: The blink tag, yeah. That was wonderful.
01:57 Vincent Spruyt: Exactly.
01:58 Michael Kennedy: Those were good days.
01:57 Vincent Spruyt: Exactly. So in over the years, I've been more into network security and hacking. Started a company when I was around 18 and converted more to languages like Java and C++ throughout my Ph.D.
02:09 Michael Kennedy: What was the company you started when you were 18?
02:11 Vincent Spruyt: It was a network security company, so quite different from what I'm doing today. It was a lot about the, you know back then the Internet was more or less the wild west. Everything was wide open so, interesting times.
02:23 Michael Kennedy: Yeah, I remember back then that things like Windows XP was the most popular operating system, and it had no firewall at all.
02:31 Vincent Spruyt: Yeah, yeah, exactly.
02:33 Michael Kennedy: It was right on the Internet. It was really bad. Yeah, cool, so okay. Those were interesting, the wild west days, for sure. And then you said you moved on to C++ and Java?
02:42 Vincent Spruyt: Yeah, so during my Ph.D., I was mostly working on computer vision machine learning. So it was heavily focused on real-time processing. So there most of the work was only C++, and then I joined Sentiance about two years ago to start out the data science team. So back then, Sentiance was quite small. I think we were at five people, and we had to choose a main programming language for the machine learning stuff, the data science part. So I guess coming a bit from an academic background, I was looking for a language that both had the ease of use that MATLAB or R has, but, on the other hand, also is a language that allows us to go to production and build scalable systems. That's how we come up with Python, and so since then, Python has been my go to.
03:26 Michael Kennedy: Yeah, that's really great. Three years ago, I think it was probably on the border whether Python was going to be one of the really dominant machine learning languages. I mean, now it's really a clear choice, but three years ago, it was just starting to be a clear choice, right? What other things did you consider and why, in the end, did you choose Python?
03:46 Vincent Spruyt: The thing is that obviously you don't want to use MATLAB in a production system. So then, I guess the choice was whether or not we would want to use a programming language like Java for the data science work. And there has been some discussion around it because the data engineering team here at Sentiance does most of the stuff in Java, making sure everything is scalable. All the infrastructure stuff is Java. But, for us, we noticed that although we love Java, doing rapid prototyping, quickly coming up and testing your models is just much easier in Python. And, actually, although there's a lot of discussion on whether or not Python is a production friendly language, if you look at, for example, YouTube, they also use mainly Python for their whole platform. So it's quite powerful.
04:31 Michael Kennedy: Yeah, it's incredibly powerful. You look at some of the people or some of the companies doing really interesting things. YouTube is a great example. YouTube handles like a million requests per second. So that's a pretty insane level of web traffic right there. Yeah, there's some other really cool use cases like that as well. So, I guess the main takeaway is sort of the culmination of it's quick and easy, but also, you can fully to production, like all the way to real scalable levels of running a production, right?
05:04 Vincent Spruyt: Yeah, indeed. And, of course, that the machine learning community is these days is growing, and the amount of libraries and support you have for machine learning in Python is just huge compared to almost any other language.
05:16 Michael Kennedy: Yeah, absolutely. So what do you do at Sentiance now?
05:18 Vincent Spruyt: So I'm the Chief Data Scientist at Sentiance. So basically my job is mostly focused on the research part, on building the algorithms, together with a team, of course. So Sentiance is an A.I. company first, so we are currently with about 50 people, and about 40 of them are actually technical, half of them data engineering and half of them data science. And with data science, we mostly mean machine learning, signal processing, building the actual algorithms there.
05:45 Michael Kennedy: Okay, that sounds pretty awesome, like a pretty fun job to be part of, I'm sure. So I guess probably a lot of the listeners don't know about Sentiance. Maybe give us a high level idea of what you guys do. I mean, you have this specific SDK that people can plug into their mobile apps, and then you look at the behaviors and stuff. Tell us what the big idea is.
06:10 Vincent Spruyt: Well, the idea is that these days, I mean, everyone has a smartphone to the extent that a smartphone is almost like an extension of your body. You continuously use it. And every smartphone is packed with sensors. You have accelerometer sensors measuring every small vibration of the phone. You have the gyroscope. Of course, you have the location subsystem. So what we do is we have an SDK that plugs in into the app of our customers, which are companies. And once the SDK is in the app, we start logging all that sensor data, accelerometer, gyroscope, location. We send that to our backends, running on Amazon Cloud, and there we have a bunch of machine learning algorithms that extract behavioral intelligence from it. So we learn about your behavior. What is your home? What is your work location? What do you do every day? Why do you do it? Can we predict your future behavior, etc?
06:51 Michael Kennedy: Wow, okay, so I totally agree that it's a crazy world where we always have our phones with us. I would rather accidentally leave my keys at home and leave my house unlocked than leave my house without my phone. You know what I mean? So, if you can take this information about what people are doing just ambiently, and do more with it, that sounds pretty cool. So, you take all the sensor data, orientation, accelerometer, location, things like that, and you say you extract intelligence from it. So, I watched, there's a video on your home page. I'll link to it in the show notes, where you have these different layers, right? One layer comes in, and just takes in this ambient information. One tries to understand what you're doing, and then the other, what's the final one? To try to create these moments or something? Tell us about that.
07:43 Vincent Spruyt: We start with the rather low-level sensor data and, based on that, we do what we call event detection, so we have a whole bunch of classifiers that take in the sensor data and, for example, try to classify your mode of transport. Based on the vibrations of the phone, you can figure out is this person walking, biking, in a train, on a subway, in a car? Or, for example, given your location, your current location, we want to figure out where are you. Are you visiting the bus stop that is five meters from your location or, maybe, if you're there in the middle of the night for five hours in a row, you're not visiting the bus stop. You're actually in the bar that is like 50 meters further, right? So, we end up with this whole timeline of human behavior, but of course, then you know what the user is doing, but not why he's doing it. So, we feed this event timeline into our deep learning based prediction model, we can predict what you're going to do next, and that allows us to explain why you're doing something. Are you in a car because it's your commute or because it's your shopping routine? Is this a business trip? Is it a leisure trip, etc? Finally, in D3, we aggregate all that data, all those timelines, over weeks so we can come up with more like a profile. Are you a shopaholic? A workaholic? Are you an aggressive driver? Do you have children? And that data we then expose back to our customers.
08:53 Michael Kennedy: That's pretty wild. It sounds really challenging because like you said, there could be a bar, and then right outside the bar could be, you could be sitting right in the front, you know? With maybe, on a beautiful day, the windows are open or something, and then right next to that is a bus stop and so determining whether you're trying to go on the bus or you're trying to relax after work, that sounds like a real interesting challenge.
09:15 Vincent Spruyt: Yeah, exactly. Even more, you can be in a bar because you work there. You can be in a bar because you just like to go for a drink. You can be there, because you live in the apartment on top of it. So, trying to figure out why a user is doing something is pretty cool.
09:28 Michael Kennedy: Yeah, that's definitely taking it to another level. So, what does your day-to-day look like? Do you do a lot of research? What kind of tools do you do to come up with some of these models? Do you write production code? What are you doing?
09:42 Vincent Spruyt: It's a mix of both research and actually writing production code. Most of the time, when we start on a project, there is a research phase. I mean, there's a lot of reading papers, experimenting. Usually just in by the notebooks. You create your models, you validate them. But then, at some point, in D3, we have to convert this coding to something that's production ready, so then it actually boils down to cleaning up the code, creating a nice object oriented framework, the unit testing, regression testing, performance profiling, make sure that everything is scalable, and then encapsulate it into an API, so that we can deploy this as a microservice, and finally try to get a microservice running in a Docker image, and then it's actually then when the data engineering team comes in. So the data engineers take this Docker image, refine it, basically they build a base Docker image, so we just customize it a little bit for our specific projects, and they help us to make sure that we can deploy it in a scalable manner.
10:35 Michael Kennedy: That's cool, and do you guys deploy it through your own system, or do you deploy, I forgot what a AWS's container service is called, but do you deploy it to their container service, or do you have more control over where it lives and runs?
10:47 Vincent Spruyt: It's our own system, so we try, I mean we love AWS, but we try to be not too dependent on their specific services mainly, because we need to be able to move to different cloud infrastructures if needed also.
10:58 Michael Kennedy: Isn't that an interesting struggle? There are so many features, and AWS and Azure, those are the two that have a ridiculous number of things that the cloud can do. But the more that you put those hooks into your system, and those dependencies, the more you're locked in. I mean, people used to talk about lock-in with Windows or with Mac or with iOS or whatever, but the cloud lock-in is a whole another level if you go all in, right?
11:25 Vincent Spruyt: Yeah, exactly, and of course that's our business model. I mean, they try to get you to use those specific services.
11:32 Michael Kennedy: Yeah, that's cool though, okay. So going with your own container service, that makes a lot of sense. And for the tooling, for the R&D stuff, this is like a Python notebook, what are the notable packages and libraries using there?
11:46 Vincent Spruyt: It depends on the project. Obviously, we just, we use a lot of typical libraries like scikit-learn for most of the modeling. When we talk about deep learning, it's usually TensorFlow, and some Keras. For performance and memory profiling, we create things like flame graphs. For unit testing, we use pytests or nosetests.
12:03 Michael Kennedy: Yeah, nice. Are you using GPU clusters in AWS or is this pure CPU based?
12:09 Vincent Spruyt: For deployment at inference time, it's usually CPU based. For research, I mean, like training and models, we indeed use GPU machines, because it an take a few weeks before a model is trained. So it really speeds up.
12:21 Michael Kennedy: Yeah, and how fast is it if you use GPUs? Also, weeks, and if it's not GPUs, it's even worse?
12:26 Vincent Spruyt: Yeah, exactly, so the last model I trained a few weeks ago, I started out using an AWS machine. It was just a CPU machine with 36 cores, and eventually, I moved to a GPU machine where we trained about two weeks, and the training there, I mean, the loss went down about 10 times as fast as when I was using a CPU machine.
12:46 Michael Kennedy: Wow, that's really awesome. Yeah, and for performance and stuff, you just get that basically dialed in, I guess. There's two parts, right? There's training and then there's answering the question, the inference bits. How do you sort of balance those things? Obviously you're just going to train as much data as you need, but then what do you do for performance at that point? I mean, once you have the model built, and you kind of have the library, you're using TensorFlow or whatever, you kind of, how much flexibility do you have to make it go faster?
13:13 Vincent Spruyt: Well, it's a good question, and usually, it's really, just trying to balance performance and cost, I guess. I mean, you can have a model with three million parameters, and a gain of 1% in accuracy, compared to a model with three hundred thousand parameters, right? While the latter of course is much cheaper to be used in production, so a lot of the time, it's just balancing out those two.
13:36 Michael Kennedy: Okay, that's interesting. Do you sometimes build super-detailed models in the research phase and go, "I think we can take it down to a hundred thousand, or whatever levels, and then run that in production to get good enough answers?
13:49 Vincent Spruyt: Yeah, we do, especially of course if some of our models started out on the cloud, and then at some point, we realized we could actually deploy those on the mobile phone itself, so then it's very important to reduce the number of parameters as much as possible without losing too much accuracy.
14:05 Michael Kennedy: Yeah, that's really cool, and that's definitely a trend in the space to not have these tremendously powerful cloud infrastructures, but to push it to the edge, right? To the devices.
14:15 Vincent Spruyt: Yeah, indeed. Especially with Apple in the new iPhone has XC 11 chip that is a dedicated coprocessor for these kind of things, and at the same time, Google were to be sold too if they had these separate chips specifically for image processing, computer vision. So more and more phones will have coprocessors that allow us to do edge computing without draining the battery too much.
14:38 Michael Kennedy: Yeah, that's more or less like running on GPUs, right? These specialized hardware are way more efficient and quick, so more reasonable to run on these wimpy devices.
14:47 Vincent Spruyt: Exactly.
14:48 Michael Kennedy: So you said that you guys were about 50 people. What is the, and if I remember the breakdown right, 20 or so data scientists, 20 or so for sort of software web side of things, what's the team structure? How do you guys work together and things like that?
15:05 Vincent Spruyt: We have a model that is loosely based on the Spotify model. We work in a kind of matrix structure where horizontally we have a set of functional teams. There's the data science team. There's the data engineering team, and there's a mobile SDK team, and then there's a solutions team. But we quickly realize as those teams grow bigger and bigger that it's very difficult to, you don't want to be isolated in your team. You want to work together with people from different backgrounds. So that's why vertically, over those teams, we find cross functional teams that we call squads. So a cross functional team has quite a specific focus. It's kind of a mini startup, and it consists of a few data scientists, two data data engineers, two mobile guys. They build stuff from concept to actually bringing it into production.
15:49 Michael Kennedy: That sounds really like a cool way to work, actually. So there's some major feature or new library you guys want to build and you put together these cross functional teams to build it, huh?
16:00 Vincent Spruyt: Yeah, and it usually, those cross functional teams, the squads are long lived. So it's not like they are created and then disbanded quickly, because of course we continuously try to improve our products. So we have the modsense squad that focuses on everything around signal processing, and deep learning directly on the sensor data. Then we have a lifestyle squad that focuses more on the moments and the segments that we use more like in LP related techniques. And of course we try to move people around. I mean, you don't stay in a single squad forever.
16:30 Michael Kennedy: Of course. Sounds really cool. So let's dig in to the three layers of your SDK, the event acquisition, the moments, and the segments you call them, right? So there's some pretty interesting algorithms and libraries that you're using. So the first level is this idea of events, and the basic question you're trying to answer is what is the user doing? So maybe we could talk about some of the algorithms and techniques you're using to determine are they driving? Are they walking? Are they at a bar? Whatever.
17:03 Vincent Spruyt: Yeah, cool. So transport mode detection itself is a cool problem. Both iOS and Android all already have what they call motion activity, so they give you an idea already about transport mode, but it's quite limited. I think they support walking, biking, vehicle, and idle, something like that. Also, their accuracies are quite low, usually. So indeed we had to build our own model to get better accuracies and especially to extend the number of transports we support, like bus and subway and running, and stuff like that.
17:31 Michael Kennedy: Sure, can you still leverage these motion chips at a lower level and not just ask what are they doing, but give me the actual measurements that you were going to use to make that assessment.
17:43 Vincent Spruyt: Currently, we get 25 for accelerometer and gyroscope data from the phone, and based on that data, well first of course there's some preprocessing, some signal processing, you have to interpolate samples because they don't come in on a regular let's say. You have to do some filtering, to remove the high frequency components that usually contain a lot of noise. And then after that, when you have a signal that is more or less clean, what we do then is a lot data augmentation. We add some noise, additive noise, multiply noise, and that is mainly because every phone has different noise characteristics, and we don't want our machine learning models to learn to recognize specific phones, so to undo those noise characteristics, we basically deliver the add noise to our data. So that classifiers learn to generalize. And what we then do is we feed that sensor data. Well, maybe it's interesting to have a look at the evolutions. Today we use a neural net component, but we started out in a completely different way. In the beginning we actually chopped our sensor stream into pieces of several seconds. For those pieces of our segments of sensor data, we did a lot of manual feature engineering, like some Fourier coefficients, frequency domain features, time domain features. Those are fed into a random forest, like then, and they're in a forest and output class probabilities.
18:55 Michael Kennedy: Maybe quickly define what a random forest is for people.
18:58 Vincent Spruyt: A random forest is basically an ensemble of decision tree, so you can, one of the most simple classifiers is a decision tree, where it's just like a binary search tree, but you say, okay, "If this feature is higher than a certain value, go to he left node. Otherwise, go to the right node" and you go through the whole tree until you have a decision on what is the transport mode. Problem is that decision tree's not very powerful. It quickly overfeeds your data. So what you can do is just build a thousand decision trees, all a little bit different, all on different subsets of your data and your features, and then you end up with a random forest. So it's kind of averaging out all those predictions.
19:33 Michael Kennedy: Right, so kind of somewhat combats against the over-fitting problem you would run into?
19:37 Vincent Spruyt: Exactly. The thing of course by chopping up the sensor streaming into pieces of several seconds, you completely lose the temporal dependencies. It could be that one piece is correctly classified as car, and the next piece is maybe incorrectly classified as walking. So you still want to do some temporal smoothing, so what we did back then is we fed that information, those segments, or actually the class probabilities into a hidden Markov model, and a hidden Markov model is able to learn short-term temporal dependencies and kind of smooth out the end result. So that was our first version of the transport classifier. And then over the past three years, we through several iterations, so the random forest was replaced by boosted trees, actually boost, which has used a lot of for example, these days in the Kaggle competitions you read about. And now recently, we figured out that actually just using a convolutional neural net, with one dimensional convolution because of course you don't have images, allows us to not only get an improved accuracy but also come up with much smaller models that more easily fit in memory, compared to these huge random forests.
20:37 Michael Kennedy: Okay, yeah, that sounds really interesting. Thanks for sharing the evolution. I think that's pretty cool. So you've got all these events, users driving, users at work, users walking, users at restaurant, users walking, users at work, things like that. And then you try to create what you guys call moments, which is why are they doing this? Why are they walking? Oh, they're walking to lunch. Things like that, right? So maybe talk about the analysis that you guys do there.
21:05 Vincent Spruyt: Well, similarly there was an evolution on that level too. The main idea is that if you can predict what users will be doing next, you can use that indeed to explain why he's doing what he's doing. So if, for example, user's predicted to go to work, then the fact that he's in a car means he's in a commute, while if he's predicted to go to shop, the fact that he's in a car means he's probably in a shopping routine, right? So the first step is to teach a model to be able to predict your next event. There we started out with a Markov chain like approach. A Markov chain basically just tries to learn transition probabilities, just learns a distribution over learns to predict the probability of your next event being event A given your previous events, so it learns very short term dependencies. We quickly saw though that those short term dependencies were not able to model complex human behaviors. It worked in simple cases, especially if you include features like diamond day, it worked in simple cases like going to work and going home, but what if suddenly you wake up an hour later than normally and your whole day shifts a little bit, then suddenly the Markov chain model completely kind of blacks out.
22:14 Michael Kennedy: Hey everyone, this is Michael. Let me tell you about Datadog. They're sponsoring this episode. Performance and don't exist just in your application code. Modern applications are systems built upon systems, and Datadog lets you view the system as a whole. Let's say you have Python web app running Flask. It's built upon MongoDB and hosted and scaled out on a set of Ubuntu servers running NGINX and mWSGI. Add Datadog and you can view and monitor, even get alerts across all of these systems. Datadog has a great getting started tutorial that takes just a few moments, and if you complete it, they'll send you a sweet Datadog t-shirt for free. Don't hesitate, visit talkpython.fm/datadog and see what you've been missing. That's talkpython.fm/datadog.
22:55 Vincent Spruyt: What we used today there is, again, deep learning. We use an LSTM.
23:00 Michael Kennedy: What's an LSTM? Long--
23:00 Vincent Spruyt: Yeah.
23:01 Michael Kennedy: Short term memory?
23:03 Vincent Spruyt: Yeah, exactly, exactly. So, an LSTM is a recurrent neural network that is able to, so when you think about deep learning, convolutional neural nets for example, they are deep because you have a lot of layers. LSTMs are recurrent neural networks. They are deep, not because you have per se a lot of layers, but because they learn deep in the time dimension. They learn a lot of temporal dependencies. As opposed to a Markov chain where you only have a dependency on the previous event, an LSTM can depend on 20, 30, 50 events back in the past, right? It can say, okay, 50 events ago the user was in a car, and 20 events ago, he was at a shop. And given all that behavior, user's probably going to do this next.
23:44 Michael Kennedy: That's cool, yeah. The longer you go without shopping, the more likely you are to shop. Things like that, for groceries, right?
23:51 Vincent Spruyt: Yeah, indeed, indeed. And the cool thing is also that the Markov chain model, by nature, had to be trained specifically for each user separately on the user's data, while the LSTM, we trained differently. We trained one global LSTM, feeding it thousands and thousands of different timelines of different users. The LSTM thereby learned about general human behavior and it learned how to, what events from the past it has to pay attention to to predict something in the future. And then, for a specific user that has never seen the training set before, we don't have to fine tune or retrain the LSTM, we just feed the past three weeks of events into the LSTM, and the LSTM already learned during the training phase how it should use that past to predict the next event. So the nice thing is that you don't, I mean, if you have a million users in your platform, you cannot have a million deep learning models, right, that you have to retrain every second.
24:43 Michael Kennedy: Yeah, of course. And how do you mark them, right? All of this stuff is happening without necessarily going, "Yes, I'm shopping now, yes, I'm doing this." And then eventually, it learns, okay, you're shopping so I know what that means, right? This is sort of all inference-based.
24:59 Vincent Spruyt: It's combination, so on the event level and the lowest level, we do have a lot of label data, so we spend a lot of time with customers or even we paid a lot of students to go out on the road, take trains and trams and bus, label the data, and then, internally, we built some tools to clean up the data and make labels more accurate. So we do have label data on that level. Of course, when we go more to moments and segments it, it becomes very difficult to get your hand on label data indeed. So there, we focus a lot on semi-supervised learning and things like transfer learning. For example, using ticketed loss function, we can learn this high-dimensional feature space in which two users with similar behavior are close to each other and two users with different behavior are far from each other. And then, in that feature space, you can build very simple classifiers using limited label data to actually come up with user segments. So that's kind of a transfer learning approach that allows us to cope with limited amounts of label data.
25:56 Michael Kennedy: Very novel. Okay, that sounds like it's working out really well. I've definitely been part of projects where it's like alright, we're going to hire 100 students to do this for an hour. You know, sometimes, that's just what you got to do, right?
26:08 Vincent Spruyt: Yeah exactly, that's how it got started.
26:12 Michael Kennedy: Yep, but you can't pay a million students. Well, not much anyway. So alright, so that's moments, and you have your LSTM deep learning model there. And then, the final, the real end goal is to, I guess Moments is already probably an end goal. They're at this store because, also, you want to classify people into groups, right? What type of driver are they, do they work a lot, are they parents, are they teachers? Why are they at the school? Are they at the school because they're teaching there, because they're a student, because they're a parent dropping off a kid, things like that, alright? So tell us about the algorithms and stuff in segments.
26:46 Vincent Spruyt: That's a bit what I was talking about earlier, so it's feature space. The thing with segments is you, of course, some segments can be business rules. I mean, you're a workaholic if you work more than a specific number of hours, right?
26:57 Michael Kennedy: Right.
26:57 Vincent Spruyt: Some segments like, are you a parent for example, that is less obvious. Being a parent definitely influences your behavior. I became a parent six months ago and I'm a completely different person. How do you capture that behavior? I mean, you cannot put it in a business rule, right?
27:13 Michael Kennedy: Yeah, so tell us how you can determine if someone's a parent for example, that's pretty interesting.
27:18 Vincent Spruyt: So what we did there is we used deep learning to analyze, to compare actually, the behavior of different people and to learn a feature presentation, like a feature effector consisting of 50 floating point numbers where each dimension, each floating point number, encodes a different characteristic of the person. Maybe the first number encodes your demographics. Maybe the second number encodes how many times you go to sports. The difference with traditional machine learning is that in this case, we didn't manually define the meaning, semantic meaning, of each of those 50 numbers. Instead, we let our neural network figure out, on itself, which dimensions it should learn to capture human behavior. And once you have that, you can actually take this timeline of events and code the whole event timeline into 50 floating points. And then, you have a rather small feature space with only 50 features on which you can easily build even linear classifiers, very simple classifiers, using limited amounts of label data. People, for example, for which we know they are parents. It generalizes extremely well because your feature space is so expressive and because the feature space was learned using unsupervised learning. So we can use all the data we've got in the past to learn interpretation.
28:35 Michael Kennedy: Okay, so you have these 50 classifiers or points that are sort of grouping people together. How did you determine, this grouping means its a parent? Did you find some people you knew were parents and say, oh, they also have this feature, that must mean they're a parent, or how did you assign values to that?
28:53 Vincent Spruyt: Yeah exactly, so it's kind of, the feature space just allows us to do user similarity modeling. And then indeed, we do still need label data, just not a whole lot of it. We can find, we can ask 100 users to install our demo app, got it, walk around with the data for a few weeks, with the SDK for a few weeks, tell us whether or not they are a parent, and then, in this feature space, if we look at those people, well, other parents will be very close to them. That's how we can build a classifier to detect parents.
29:22 Michael Kennedy: That's pretty awesome. Do you feel like there are pieces that are missing? Like, there's dimensions of human behavior that are not captured?
29:31 Vincent Spruyt: Probably. Up till now, it worked. It's also all very new to us, because we started out also here with mostly business rules on top of our event sequences. So most of the machine learning in the past was on the bottom layer, on the event layer. It's only since recent that, recent times that we're also doing this unsupervised, semi-supervised learning on the segments in the moments. But, yeah, probably, the difficulty indeed if you use representation learning is that it's very difficult to control which dimensions the deep learning things are important to capture human behavior. So I can imagine that not everything is captured there. But, in the end you can easily solve it by fixing some of the bottom layers of the pre-trained network, and then tuning it a little bit more on a smaller set of labeled data, fine tuning the upper layers, and that way it still is able to learn those things.
30:20 Michael Kennedy: Yeah, okay, very interesting. The dependency that you're talking about here it sounds like it could be really tricky. Like suppose you guys redesign your transportation mode detection, and it turns out some of the time you thought people were walking and they're just in traffic, but really slow traffic, or something like this, right? Instead of everyday taking a walk down I-5, the interstate highway, they're actually just driving in really bad traffic. That probably has knock on effects for moments, which has knock on effects for segments. So how much of like if this training of networks takes weeks potentially, how bad is it if you change the bottom layer?
30:59 Vincent Spruyt: That's indeed a very actual problem we encountered, especially now that we are more and more using representation learning to learn these features based on the bottom layer. Indeed, if you retrain one of the models, the resulting features base could have a completely different meaning, which means that all the consuming models that would follow in the guess case would have to be retrained, and of course you don't want that. We solve this I guess in different ways. On the one hand there's a decision you have to make between deploying a trained model as a microservice that is then consumed by other models in the guess case versus actually just using the pretrained model and fine tuning it in a model that consumes that information. If you do it the first way, if you put it in a microservice, then indeed if you retrain the first model you have to retrain the second. But, if you do it in the other way, if you just combine both machine learning models eventually into one model, then of course you don't have this dependency. So in this sense we actually try to just use pre-trained models and fine tune them and embed them into the next model as much as possible, and only go to a microservice if there is good reason for it. If your model for some reason is, for example, huge amounts of memory or a large amounts of SQL querys to a database or something like that, then that is a good reason to actually put it in a microservice. That's one way we try to solve it. Another way that's something we're still working on, we don't have it today, is that we are trying to create a model that basically learns a locally linear mapping from your previous features base to the new features base after retraining, or actually the other way around. So, if you retrain a model but you put a mapping layer after it, then that mapping layer can actually make sure that even if the model needs to be retrained the new features base is mapped to the same semantics as the old features space.
32:44 Michael Kennedy: I see, so the inputs to the next level model basically are literally transformed to look the same as they would've before.
32:52 Vincent Spruyt: Exactly, exactly.
32:53 Michael Kennedy: Okay, yeah, that sounds like a pretty interesting set of challenges and some good solutions. But yeah, definitely it seems like that's something that's always going to be a bit of attention.
33:04 Vincent Spruyt: Yeah, exactly. And we're also, I mean we're continuously trying to figure it out ourselves. There is also of course versioning, because if you deploy a new model there is at least some period in which you're going to have to run both the old model and the new model in parallel, because not all the consumers will be updated at the same time. It's getting complicated quickly, but luckily we have an awesome data engineering team there to help us solve all that.
33:27 Michael Kennedy: Yeah, that's cool. So one of the things I was wondering as I was looking through all this, there's a lot of statistics and statistical inference in understanding these models. So somebody who works on your team as a data scientist, what's their general skillset? Like how much programmer versus how much statistician versus some other skill I'm not thinking of?
33:48 Vincent Spruyt: It's a bit mixed. So in general we say that everyone at Sentiance is a software engineer. So that means that every data scientist has to have good software engineering skills, not just some scripting experience or something. You have to be a software engineer, that's for sure. And then of course most people we put a lot of emphasis on the machine learning background. So, most people in the data science team they either have a PhD in machine learning or computer vision or something, or they have a background in physics or mathematics. They need an analytical mindset let's say. And then finally there is signal processing, which is kind of a specific field. So, people coming from robotics or from speech recognition or also image processing often have a good signal processing background. Yeah, it's quite challenging to find people that combine all three of them.
34:35 Michael Kennedy: Yeah, I'm sure. Definitely sounds fun in terms of projects you could work on. You guys don't build apps, right? You basically provide this SDK or this API to customers who themselves build apps, right?
34:49 Vincent Spruyt: Well actually we just hired a designer, but ourselves we're not the best in creating very fancy apps or something. We are really a tech company. And indeed we have an API through which we expose all this information back to our customers, but the customer still needs a tech team. They need data scientists or developers to be able to do stuff with it.
35:11 Michael Kennedy: This portion of Talk Python To Me was brought to you by GoCD. GoCD is an on premise, open source, continuous delivery tool to help you get better visibility into and control of your team's deployments. With GoCD's comprehensive pipeline modeling, you can model complex workflows from multiple teams with ease. And GoCD's value stream map let's you track changes from commit to deploy at a glance. Say goodbye to deployment panic and hello to consistent, predictable deliveries. We all know that continuous integration is super important to the code quality of your applications. Choose the open source local CI server GoCD. Learn more at talkPython.fm/gocd. That's talkPython.fm/gocd. All this sounds so cool and like powerful and useful, but at the same time it also feels like it could be a little bit invasive into people's lives and into their privacy. So, what's the story around trying to strike that balance?
36:10 Vincent Spruyt: That's a question we get a lot, and indeed it is a balance we have to maintain. There's a lot of information you can extract from sensor data. Even your personality and your mood is something that's on our roadmap, something we're looking into, not something we have today completely, but your personality influences how you behave and how you behave influences the motion of your phone. So there's a lot of stuff you can do with it. And indeed, then privacy becomes an important question. For us, on the one hand there is GDPR, so like the recent European privacy legislation. If you look at the GDPR, Sentiance is a data processor, not a data owner, which actually means that compared to let's say Facebook or Google or something, we never claim that the data is ours. The data is still owned by the customer, which means we cannot combine the data with other data, we cannot sell the data, and the data is siloed, that's one thing. On the other hand, and probably much more important, is that we explicitly force our customers to ask consent to their users. So it cannot be that they use our SDK and put something in a small privacy statement hidden in the app or something. Our customers really need to be very upfront with their end users, tell them what kind of data they got or why they do it. And as long as those customers provide enough value to the end users, that works. It won't work for let's say advertising solutions. Nobody wants to give consent to gather all this data to have better advertisements. But, it does work for let's say health and lifestyle coaching. If we can help you live a healthier life, if we can contextualize your heart problems, or maybe even for insurance if we can model your driving behavior and by that, by doing so reducing the amount of money you have to pay to insurance company, well that is enough added value for most users to actually give that consent.
37:55 Michael Kennedy: Yeah, that's a good point. Yeah, I guess it's all about the trade off for the benefit, right? Like you said, no one is going to go, "I would love to see better banner ads in my Candy Crush app," or whatever.
38:09 Vincent Spruyt: Yeah, indeed. The tagline of Sentiance or at least what our CEO often says is, "We want to make sure that AI improves people's lives." And it might sound a bit cheesy, but imagine indeed a world where you don't have to adapt to all your surroundings but instead your phone knows who you are, knows what you feel, knows what you want, and the whole world adapts to you, not to spam you or manipulate you, but just to make your life easier and healthier and improve your quality.
38:36 Michael Kennedy: That's the promise, right?
38:37 Vincent Spruyt: Exactly.
38:39 Michael Kennedy: So you talked about being an SDK. Can you give us some examples of some of the apps that are using you guys as a service?
38:45 Vincent Spruyt: Yeah, sure. There are different components of course in what we do. One of the components I quickly mentioned before is driving behavior, where we in a detailed manner model how aggressively you drive, how do you take your turns, what is your driver DNA. And that is currently being used by I'm not allowed to name them, but let's say by the biggest ride hailing company in the world to actually model the safety of their drivers. So, not the passengers, but the drivers themselves so that they can coach them and make sure that the riders are safe when they take the cab. Another example is for example one of the biggest brand loyalty companies in the UK. So, they have a huge user base of users that installed their apps, because they want to get the latest coupons and that kind of stuff. And so they use our SDK to just personalize their communication with their users to make sure that the user is not spammed with information that they don't care about, but instead it's a very personalized communication and increases engagement.
39:44 Michael Kennedy: Right, so maybe if you could tell them like this person is a parent versus this person is a workaholic, or if they're both, they might treat them differently, right?
39:54 Vincent Spruyt: Yeah, exactly, exactly. I mean if you know that someone sportive you can propose more interesting stuff than if you know, okay, this person never sports and indeed he's a workaholic or something like that. And I guess maybe the most interesting use cases, to me at least, are in health and insurance. So in health for example we work with Samsung, who is also one of our main investors, on detection of heart arrhythmia. So, problems with if you have heart fibrillations and you want to contextualize it, you want to know why it happens, when it happens, and you want to expose that to your doctor so your doctor can say, "Okay, we see that if you work late, and I see that you're a workaholic in general, and if you eat a lot of fast food that week that is a time when you usually have your heart problems." So, that's one of the use cases.
40:35 Michael Kennedy: How does it know? Do you have to like, do you have a different device that detects the arrhythmia and like flags it in time and then you can overlay it on your timeline or something like that?
40:44 Vincent Spruyt: Yeah, exactly, exactly.
40:45 Michael Kennedy: Okay. Yeah, and you said you're also working with another company doing something similar?
40:48 Vincent Spruyt: Yeah, so there is ... In the health space we work with some smaller companies also from Europe and Belgium and the Netherlands more specifically. So there is FibriCheck for example. FibriCheck they have a mobile app where you can put your finger on your camera and the app will use the camera and the flashlight to extract your heart rhythm from the blood flow to your fingertip. So they're also, and they also they used our SDK to contextualize that, to predict when you probably are going to have heart problems, why it's happening, and to expose this to a doctor. And then there is another example, MedUP. MedUP is a company in the Netherlands. They have an app for care adherence, medical adherence. So a lot of people have to take a lot of pills, and a lot of people actually forget to take their pills, and it's a huge problem. So what they did is they developed an app that reminds users to take their pills on time. But of course, if you just get such a reminder, you know, an alarm on your phone right before you have to go to work or maybe even when you're in the car, then you just snooze the alarm or dismiss it and you forget about it altogether. So, they use our SDK again to tailor those alarms to make them contextual aware and remind users at the right time.
41:56 Michael Kennedy: Right, like if you're driving it makes no sense to remind you so wait till you get to work.
42:00 Vincent Spruyt: Exactly.
42:01 Michael Kennedy: Or wait till you return home if it knows you're coming home or something like that.
42:03 Vincent Spruyt: Yeah, indeed. Or if we predict you will probably be leaving for work in 10 minutes, then this is the time to remind you and don't wait 10 minutes.
42:10 Michael Kennedy: Yeah, that'd be even better. Cool, so those sound all really interesting. We talked a lot about your architecture already actually, but there's a few things that we haven't touched on that I think are worth covering. One of the things you guys use is something called devpy that's sort of an alternative local PyPI. Tell people what devpy is and how is it helping you guys.
42:29 Vincent Spruyt: The problem we had with Python and PyPI as a package server is that you quickly end up in kind of a dependency hell. You develop your project, put it in a repo, you have a setup.py to easily install it. And as the requirements you list let's say NumPy version X, but also you list package Y as a dependency, but package Y actually depends on NumPy version Z. So, you have this whole conflicting set of dependencies, which quickly becomes very difficult to manage. And how we actually did versioning of our own packages, so our repositories, is in the past we specified a version attribute in the setup.py and use git tags on the git packets, so on our git repositories, and those stacks also contain a version number. And that kind of allowed us to pull the correct version and try to get everything installed as it should be. But then you have to make sure that you maintain a setup.py, don't forget to increase the version number there, make sure the text are in sync, and it really becomes messy quickly. So, how we solve this is indeed we use devpy these days. We have our own package server. Our Jenkins server, so Jenkins basically is our build server, everything gets built into packages automatically there. Jenkins builds wheels from our internal repositories, builds those wheels both for Mac for the developers and for Linux for actual production, and it stores that with a version number. And then if we do pip install something, then first our devpy is consulted. It fetches the correct version, the package with the correct version and correct dependencies, and only if it cannot find it there it goes further to PyPI.
44:08 Michael Kennedy: Yeah, that's really cool to be able to control it like that. Do you distribute your own packages for use on other projects within your own devpy?
44:18 Vincent Spruyt: Everything we build is contained in like a functional repo let's say with an API. And then we always have a microservice wrapper repo that just uses that uses that functional repo as a dependency. So, the functional part is always built by Jenkins, put into devpy, and can then be used by different other projects as a dependency.
44:39 Michael Kennedy: Yeah okay, that sounds like a really good setup you guys have going there. Another thing that you talked about is pragmatic use of deep learning. Tell us, what do you mean by pragmatic use? Like what are some of your recommendations?
44:53 Vincent Spruyt: Deep learning is cool, and especially if you hire new people and they hear that we do deep learning, we use it a lot actually, they're eager to also start using it for the problems they start working on. But of course we have to be pragmatic indeed in a sense that a lot of problems just don't need deep learning. For example, if I talked earlier about detecting what is your home location and what is your work location, you can solve that without deep learning. You just do some feature engineering, gather a little training data, and train a linear support vector machine on top of it or something. It's important I think to use deep learning if it really solves your problem, if it makes your product better, but indeed don't just follow the hype.
45:33 Michael Kennedy: Don't just do it because it's a buzz word?
45:34 Vincent Spruyt: Yeah, exactly.
45:36 Michael Kennedy: Or VC money because of it.
45:37 Vincent Spruyt: Indeed.
45:38 Michael Kennedy: Yeah, that's really cool. Sometimes just standard algorithms and if cases effectively right are really all you really need sometimes.
45:47 Vincent Spruyt: It is true that these days with the VCs and even for customers, for some reason it sometimes almost sounds embarrassing if you have to tell them for a part of your product you use traditional machine learning. It's like why don't use deep learning? But it's a matter of cost, it's a matter of accuracy, and also maintainability. If you have a very simple ... Actually, I think it goes even further than the simplest example I gave. If you can solve a problem by a simple business rule, then that's the way you should go.
46:15 Michael Kennedy: Right, absolutely. Let's take just a moment and take a step sort of up this higher level, not anything that you guys are doing in general, not referring to your product or your mission, but just in general, like there's some people like Elon Musk who I'm a big admirer of in general and others saying, "We should be really worried about AI and machine learning," and other people saying, "Nah, this'll make things lots better." And you've given us some definite examples where it is going to be better for people, right, with like health for example. But where do you land in this debate and do we live in a simulation?
46:51 Vincent Spruyt: First, the extreme example or the extreme cases that you sometimes read about about AI taking over the world and stuff like that, well an interesting quote there, I think it's from Andrew Ng, one of the big deep learning guys, who used to be head of AI at Baidu and I think today still head of AI at Stanford, he said at some point the fact we're talking about AI taking over the world is a little bit like talking about overpopulation on Mars. It might happen at some point, it probably will, but there is still no clear pathways, right? So that's one thing. Of course, it is true that AI or machine learning, which I like more as a term actually, is becoming very powerful, and in that sense like any powerful tool it can be used for the good and the bad. So, I do agree that we need politics, we need legislation, to be ready for this. We need to make sure that governments are limited in what they can do, that they cannot force you to install an app that tracks your every moment taken then control you or control your future or something like that. So, I do agree on Elon Musk on that point that it's time for government officials to take this serious and to work on the legal aspect.
48:02 Michael Kennedy: Sure, I totally agree. There's other interesting knock ons, like I think the EU is working on this. When we get to things like driverless cars, if the driverless car is in an accident and it turns out the driverless car was at fault, who is responsible and how do you address that? If it's pure deep learning, totally unsupervised learning that made the car drive, how do you even know why it crashed?
48:28 Vincent Spruyt: Yeah indeed, and that's also a good example of the difference between technology maybe being ready soon and the world being ready for it. Because even if we would have completely very good self driving cars today, then still exactly because of the reason you mentioned we wouldn't be able to use them on the road. So first a whole transformation in the mobility sector and the insurance sector has to happen so that cars are actually seen as a service where you insure a service. It's a different mindset.
48:58 Michael Kennedy: Yeah, and I think we're going to have to get used to pushing the benefits in an aggregated way instead of a specific individual's responsibility way. For example, yeah, the self driver car did something really bad and it crashed into some people on the sidewalk, but if you look at it as a whole half a million fewer people were killed in car accidents this year. So, this is a horrible news story and it's really bad, but taken as a whole self driving cars are doing better for people, right? That's like a theory I'm imaging, right? But I can see the world struggling with those sort of ethical trade offs.
49:29 Vincent Spruyt: Yeah indeed. You know, it's a little bit, I know maybe it's not a very good comparison, but if you think about the Industrial Revolution there was a lot of people that were so scared about all the millions of jobs that would be lost if cars would not be made by hand anymore but if machines would be used. But in the end when we look back, I do think that most people agree that the Industrial Revolution made our life healthier, we live longer, made it easier, we're happier, and the same thing is going to happen with the AI revolution.
49:57 Michael Kennedy: Yeah, I think in the long term that that's true. Like I definitely wouldn't want to live pre-Industrial Revolution myself. I wouldn't trade my spot in it now. Alright, well Vincent, I think we're going to have to probably leave it there for our topics, but that was a super interesting look inside what you guys are doing with machine learning and things like that. So let's get to the two questions. First one, favorite Python editor. What do you open up if you're going to write some Python code?
50:24 Vincent Spruyt: PyCharm for sure. I love the JetBrains product in general, you know, Data Tree for database stuff, IntelliJ for Java, PyCharm for Python, yeah.
50:31 Michael Kennedy: Yeah, awesome, me as well. It's my favorite. Alright, and notable PyPI package?
50:36 Vincent Spruyt: I've been thinking about this for a long time. I think it would be python-flamegraph just because it's a very cool way to do memory and performance profiling, create framegraphs, see which methods in your code are the bottlenecks and optimize them.
50:48 Michael Kennedy: Yeah, I looked into that just a little bit and it looks like a very powerful way to quickly visualize where your performance problems are.
50:55 Vincent Spruyt: Yeah, exactly, exactly, and to actually dig deeper into the stack of calls and figure out what's happening.
51:01 Michael Kennedy: Yeah, that's cool. So I'll put a link to the GitHub repo for python-flamegraph, which has a bunch of nice pictures.
51:05 Vincent Spruyt: Awesome.
51:05 Michael Kennedy: Awesome, yeah, yeah. So alright, final call to action. People are excited about deep learning, they're excited about what you guys are doing, what do they do to get started?
51:14 Vincent Spruyt: We're expanding, we're continuously expanding. This year or coming year we should grow from 50 to 80 people, so we're always looking for passionate Python developers, machine learning guys.
51:23 Michael Kennedy: Can they just reach out to you on Twitter or something like that if they want to get more information, or how do they find out?
51:27 Vincent Spruyt: Yeah, yeah, definitely. Just ping me on Twitter or LinkedIn, or go to our website where there is a more official channel. Maybe you can refer to podcast to have an idea where you're coming from. Don't focus too much on the machine learning part either. If you're very good at Python or very good at machine learning or signal processing, we should talk.
51:45 Michael Kennedy: Okay, awesome. And then you guys have an app. Even though you said you don't build apps, you have an app. What's the story with this app?
51:52 Vincent Spruyt: Yeah, it's a demo app. Because our product is quite technical, and so of course customers ask, "Okay, how does it look like? What can I do with it?" So, we need to build a small demo app. It's called Journeys. You can find it on our website or on the App Store or Play Store. And Journeys basically when you install it after a while it will learn about your behavior. After one week or two weeks, you will see pop up a whole set of segments that we assign to you, your moments, your home and work detections. So yeah, check it out. Download it. It's pretty cool. And there is also a way to give feedback. So if something is wrong and you decide to give feedback, well, that helps us to retrain our models.
52:24 Michael Kennedy: Beautiful. Alright, well thanks for sharing your story and what you guys are up to. It was great to chat with you.
52:27 Vincent Spruyt: Thanks a lot, Michael.
52:29 Michael Kennedy: This has been another episode of Talk Python To Me. Our guest has been Vincent Spruyt and this episode has been brought to you by Datadog and GoCD. Datadog gives you visibility into the whole system running your code. Visit talkpython.fm/datadog and see what you've been missing. They'll even throw in a free t-shirt for doing the tutorial. GoCD is the on premise, open source, continuous delivery server. Want to improve your deployment workflow but keep your code and builds in house? Check out GoCD at talkpython.fm/gocd and take control over your process. Are you or a colleague trying to learn Python? Have you tried books and videos that just left you bored by covering topics point by point? Well, check out my online course Python Jump Start By Building 10 Apps at talkpython.fm/course to experience a more engaging way to learn Python. And if you're looking for something a little more advanced, try my Write Pythonic Code course at talkpython.fm/pythonic. Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss on talkPython.fm. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.