#91: Top 10 Data Science Stories of 2016 Transcript
00:00 It's been an amazing year for Python and data science.
00:02 It's time to look back at the major headlines and take stock in what we've done as a community.
00:07 I've teamed up with the Partially Derivative podcast, and we're running down the top 10 data science stories of 2016 in this joint episode.
00:15 This is Talk Python to Me, episode 91, recorded November 18, 2016.
00:21 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.
00:52 This is your host, Michael Kennedy.
00:54 Follow me on Twitter, where I'm @mkennedy.
00:56 Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via at Talk Python.
01:03 This episode has been sponsored by Rollbar and Continuum Analytics.
01:07 I want to say a special thank you to the folks at Continuum Analytics, you know, the Anaconda distribution people, for joining Talk Python as a sponsor.
01:15 Thank them both for supporting the show by checking out what they have to offer
01:19 during their segments.
01:20 Jonathan, welcome back to Talk Python.
01:23 Hey, thanks so much for having me.
01:24 I'm really excited to be here.
01:25 I'm so excited to have you back, because every time we do a show together,
01:29 I have a great time.
01:30 People seem to love it, and I know what we have on deck today.
01:33 People are going to love the show.
01:35 It's going to be really fun.
01:36 Yeah, I think so, too.
01:37 It's just, I love, I'm so glad, it was fun doing this last year, and I'm glad we're doing it again.
01:41 It's just kind of cool to have an opportunity to look back over the past 12 months,
01:45 and really, there's just so much news and so much data stuff that comes out over 12 months.
01:49 It's an industry that's moving so fast.
01:51 Just to have a chance to reflect a little bit is kind of cool.
01:53 I remembered some things that I'd forgotten about the past year.
01:56 Yeah, me too.
01:57 Like, for example, Tay.
01:58 I remember Tay.
02:00 And I just, just because we brought it up.
02:02 So we'll talk about Tay, the bot, later.
02:04 Of course.
02:06 Yeah, what have you been up to the last year?
02:08 Like, you're still doing Partially Derivative.
02:10 You got your business properly, your data science business going.
02:13 Yeah, yeah, absolutely.
02:14 So Partially Derivative, our podcast about data science, at least ostensibly about data science,
02:19 kind of about drinking, largely about screwing around, is still there.
02:23 So everybody who's interested in a little data nerdiness can go check that out.
02:26 Yeah, and then the business has been going well.
02:29 We're doing more projects, which is cool, kind of moving slowly and slowly away from the kind of startup mentality,
02:36 which I think has been healthier for everybody.
02:38 And been doing some really cool research projects, especially some natural language stuff.
02:41 So yeah, it's been a really good year.
02:43 How about you, man?
02:44 How's your year in review?
02:45 My year in review is amazing.
02:47 You know, I went independent in February.
02:51 I've been running this podcast and my courses as my primary business.
02:55 And it has just been like a dream come true.
02:58 It's amazing.
02:58 So it's been a fabulous year.
03:00 Yeah, I'm not surprised.
03:01 The Python coursework is really awesome.
03:03 I don't know how much you plug your own stuff on your show.
03:06 So I'll just do it for you, for all of your listeners.
03:08 If you're interested in learning Python and you haven't taken the courses yet,
03:12 you should go do it right now.
03:13 It's the best way to learn.
03:14 Thank you so much.
03:15 Yeah, it's really fun to do them.
03:17 All right.
03:18 So I think in our tradition, because it's happened one time, what we need to do at the end of the year is look through all of the interesting news stories
03:28 that have to do with big data, with data science, with machine learning, with Python,
03:34 and kind of do our take on them.
03:37 So let's start at something that's been in the news a lot this year, the White House.
03:43 Yeah.
03:43 So very interestingly, so, you know, we've been maybe 18 months, a couple years,
03:48 into the tenure of the country's, the U.S., our first chief data scientist.
03:53 So DJ Patil came into the White House.
03:56 We've had a CTO, a chief technical officer, in the White House for a little bit longer.
04:01 And that group of folks has been really out in front on the way that we think about data
04:07 and society, which has been a fascinating conversation.
04:10 I think there's been like little trickles of information about what it means to do machine
04:14 learning in an ethical way, how we avoid kind of algorithmic or bias in our models and our
04:20 machine learning models.
04:21 They did this report about how, as a society, we should think about the impacts of artificial
04:27 intelligence, of machine learning, and big data to make sure that we're not taking some
04:31 of the bias that's inherent in our society and therefore is inherent in the data and inherent
04:36 in our models and is perpetuating some of that over time through technology.
04:39 It was really a cool position for an administration to take, you know, sometimes government's a
04:44 little bit behind technology or the technology industry.
04:46 But in this case, I felt like they were really out in front, like kind of driving the conversation.
04:50 So it was a cool story.
04:51 Yeah, it is a very cool story.
04:52 And of course, all the stories will be linked in the show notes.
04:55 You can probably just flip over and you're a podcast player and click them.
04:57 But what I find really interesting about this is we as technologists see technological progression
05:05 almost always as rainbows and unicorns, right?
05:09 It's like every new device that comes along that connects us or enables something, this is
05:17 uniformly good, right?
05:19 But that's not always the case, right?
05:21 And actually, we'll get into some interesting twists later in some of those specific areas,
05:25 how this can go wrong.
05:27 But basically, they said, look, there's so many opportunities for data science and technology
05:31 to do good for people.
05:33 But at the same time, these like, let's take data science, the algorithms coming out of the
05:39 machine learning could have a bias and not necessarily even a conscious bias, right?
05:44 Yeah, that's actually one of the most interesting things about it.
05:46 Because I think still, statistically speaking, and I think maybe even trending in this direction,
05:51 the technology community and the data science community is still largely male and largely white.
05:57 And so the interesting takeaway, I think, from a lot of these discussions about the way that bias is kind of infecting our technology
06:05 or may not necessarily be this steady march to progress the way that we view it is because people often don't see
06:12 or have a difficult time understanding the perspectives of people who aren't like them,
06:16 which is kind of an obvious statement.
06:18 But when we're encoding our worldview effectively into the technologies that we're developing,
06:24 then we may not see the consequences of that technology.
06:29 We're not intentionally encouraging racism or intentionally kind of encoding that institutional bias.
06:35 But it's inevitable that that's going to be a byproduct of a community that's relatively homogenous still.
06:41 And so I think it's just good that it's something that we're discussing.
06:45 I think the only way to get past that is to have more awareness of it, and then ideally for more diversity in the technology industry.
06:50 But that's sort of a separate and longer conversation.
06:53 But yeah, so I think, again, it's cool that such a high-profile group of people who are leaders in the technology community
07:01 took it upon themselves to initiate this conversation.
07:03 Yeah, absolutely.
07:04 So there's a report they released about this, and it's not all negative.
07:08 It seems pretty balanced.
07:09 Like, look, there's all these great things that we're going to be able to do,
07:11 but there's also these safety checks we need to make sure are in the system.
07:16 And at the end, they also put a little note that said they encourage you to follow along this summer and spring
07:21 where they're hosting a series of public workshops on artificial intelligence and machine learning.
07:25 Like, when did you think the White House would host workshops on artificial intelligence and machine learning?
07:31 Yeah, it really is.
07:33 It's a new world.
07:34 It's pretty exciting.
07:35 And I agree.
07:36 That's good to point out.
07:36 Like, we're having this – I feel like I've been framing this as if it was like a finger wagging or like an admonishment.
07:42 And it's really not.
07:43 It's actually – there's so much potential for these amazing technologies.
07:46 Let's just make sure we're doing it in a way that it includes the entire society and not just the single viewpoint.
07:53 Yeah, absolutely.
07:53 Absolutely.
07:54 All right.
07:55 The next one up is this research paper.
07:57 It's hard to call it paper.
07:59 Digital paper.
07:59 This research article called Social Bots Distort the 2016 U.S. Presidential Election Online Discussion.
08:07 And if that sounds like a sort of tough title to say as an academic thing, it's because this is an academic paper.
08:15 It's by Alessandro Bessi and Emilio Ferrara.
08:18 And these are two professors or postdocs.
08:21 I don't remember exactly their position.
08:23 But in Southern California, it's some local universities there.
08:26 And there's this place called firstmonday.org.
08:30 And it's a peer-reviewed journal on the internet.
08:33 And it's kind of a double meaning there.
08:36 So it's a peer-reviewed journal that you can get for free on the internet.
08:39 But it's a peer-reviewed journal about research on the internet.
08:42 So it's pretty cool they've got a bunch of stuff about, like, how Reddit behaves and other sorts of things that we would probably care about, purely academic research.
08:51 And it's super interesting what they found.
08:53 So these guys, they created this thing called Bot or Not, which is a machine learning framework.
09:00 And they basically set up a bunch of hashtags and a few keyword searches.
09:06 And they said, we're going to monitor the Twitter firehose for these things, right?
09:11 That's the real-time data flow coming out of Twitter for those particular things, which already is actually a challenge.
09:17 And they talk about the technology of, like, consuming that much data, which is pretty interesting.
09:22 It's written in Python.
09:23 And you can actually get the Bot or Not thing on GitHub.
09:26 And they say it has an accuracy of determining whether a thing, like a social thing, is a bot or is human at 95% or better.
09:35 Whoa.
09:36 That's pretty solid, right?
09:37 That's kind of amazing.
09:38 Yeah, that's a difficult distinction to make, I think, a lot of the time.
09:40 That's cool.
09:41 Yeah, they said they'd taken over 1,000 pieces of data to make that.
09:45 Dimensions, I guess?
09:46 To consider that, right?
09:48 It was interesting to see the way that they – dimensions or the features that they built in order to make that model predictive.
09:55 The kind of the behavioral things, the signatures that distinguish between a real-life person who just tweets a lot and a bot.
10:02 It's interesting because it's – there's a lot of things that they were able to sort of distill down.
10:06 But it is very interesting.
10:07 And we don't want to go too much into the details.
10:09 But they really write it up.
10:10 It's like a 30-page paper.
10:11 So that's cool.
10:12 And kind of like we were just discussing, they have a similar take for social media as we were talking about with big data.
10:19 And they say social media has been extensively praised for increasing democratic discussion, right?
10:24 You think of the Arab Spring, for example, and things like that.
10:28 But they say that you can also take this social media and use it for your own purposes, good or evil, right?
10:37 So you can exploit these social networks to change the public discussion, to change the perception of political entities, or even try to affect the outcome of political elections.
10:48 Yeah, and this is something – I mean, not to get us too off topic, but I have a little bit of a research background in understanding how highly motivated, like hyperactive users of a social media platform can basically form a group that's just large enough to seem just too big to really recognize.
11:05 But if they act together, it's like let's say you have 1,000 people that are just really hyperactive and tweeting the same thing.
11:11 Real people are bots.
11:12 Those tweets or the hashtags that they promote or the content that they circulate seems as if it's gaining really widespread organic traction.
11:20 And so you can effectively, like, force a narrative onto a social media.
11:24 You can, like, hijack the mechanics of a social network using some of these techniques.
11:28 And we're seeing it increasingly from groups that have some kind of ideological agenda.
11:33 Everything from terrorist groups all the way to political organizations all the way to maybe foreign states that start with R trying to influence the outcome of the U.S. elections.
11:44 Like, it's a really – it's both – like, from an academic or intellectual perspective, it's kind of fascinating.
11:50 But at the same time, also a little ominous.
11:54 It definitely is.
11:55 Like, I really love social media, and I think it is a positive thing generally.
11:59 But there are definitely examples, and this is one – I'm going to give you some stats here in a second.
12:03 But another real clear example, which is not something trying to influence this, but it's just, you know, speaking of algorithms and unintended consequences,
12:11 like Facebook and people living in bubbles and how they perceived the news this year and all those sorts of things are very interesting to study.
12:19 Yeah.
12:20 And in fact, at the time of this recording, it's not yet released, but probably by the time this airs, we will have published it.
12:26 We've actually done some research that shows – that we think shows that the – when people in a particular community on Facebook share URLs from these fake news domains or hyper-partisan domains more often,
12:40 it actually has a direct impact on the amount of bias that we see in their language.
12:44 So that kind of – that loop where the community gets more biased, the news sites get more hyper-partisan or more extreme, and then the community gets more biased again,
12:53 like that kind of feedback loop seems to be a real thing.
12:57 And it's kind of a – how do you pull people back from an environment where they're literally not living in the same reality that you are, which is kind of strange.
13:05 It's very strange.
13:06 You know, especially when it comes to things that are outside their personal experience, right?
13:09 So even though, like, we all have the same kind of jobs, we all, like, love our families and our kids, our day-to-day lives are mostly the same.
13:15 But we can still get kind of whipped up into a frenzy about these things that are kind of at arm's length from us that get talked a lot about – that get talked about in kind of political campaigns.
13:24 It's – it'll be – it's interesting that – it'll be interesting to see how these networks start to combat it now that they're aware of it.
13:31 Yeah, yeah.
13:31 I'm looking forward to that.
13:32 I totally am.
13:33 I'm looking forward to your research as well.
13:35 That's cool.
13:36 I feel like if the Twilight Zone were still a thing, the movie from the 50s and 60s, you know, there's a couple of episodes I could get from our news here.
13:44 So are you ready for the conclusion?
13:45 Did these bots have an effect?
13:47 After all this research and a bunch of data analysis that they laid out, they said that bots are pervasively present and active in the online political discussion in the 2016 election.
13:58 And they estimate 400,000 bots engaged in the discussion responsible for 3.8 million tweets or one-fifth of the entire political presidential conversation.
14:10 Whoa.
14:10 That's a huge percentage of the – it's weird.
14:14 We think about it like, oh, this great forum for public discourse, but it's actually bots talking to each other.
14:18 Yes, exactly.
14:18 Like arguing amongst themselves.
14:20 Yeah.
14:21 It's probably true.
14:22 I bet they did fight with each other.
14:25 Actually, that was – I'm disappointed with myself because I won't remember the name.
14:28 But if people listening to this Google kind of social activist bot fighting with trolls or, you know, something along those lines, there were a couple really interesting stories about people who would use – who wrote bots, kind of more activist artist types who wrote Twitter bots.
14:42 And wrote them in a way that would start kind of banal online fights with people.
14:48 So they'd find kind of like far-right trolls and they would say things like, I think your opinion is wrong.
14:54 Like your argument isn't even valid.
14:56 And they would just like content-free argument.
14:58 But people would engage with them for hours.
15:00 Like they would just fight with this bot for hours at a time.
15:03 Oh, that's mean.
15:04 Yeah.
15:05 But I love it.
15:06 Yeah.
15:09 It is pretty fascinating.
15:10 I think – and something – I mean, you kind of mentioned it, but I feel like it's worth reminding all of the listeners because everybody – many of your listeners will be developers.
15:17 The code for this is released on GitHub.
15:20 And there's an API that you can ping.
15:22 So if you're doing any kind of research or if you're building an application that engages with Twitter, you can pretty easily check to see whether or not an account is a bot.
15:31 So if that's something that is useful to you, like I really credit the researchers for making this available to the general public.
15:38 That's a really cool service.
15:39 Yeah.
15:39 That's for sure.
15:40 It's cool.
15:41 It's on GitHub.
15:41 It's cool.
15:42 It's in Python and easily accessible to everyone listening here.
15:44 So we have one more thing about the election and then I promise we'll move on, right?
15:49 What's the deal?
15:51 Absolutely.
15:52 I mean, you know, elections are a big deal in the years that they happen.
15:56 So, you know, fair.
15:57 Well, and I think this one is especially interesting because it broke a lot of norms and the prediction just failed across the board in so many ways, in the media, in the pollsters, and so on.
16:08 And so I think it's a looking back to see what do we need to fix?
16:11 What do we need to change?
16:12 So what's this next item about?
16:13 Yeah.
16:14 Well, it's about just that.
16:16 So as anybody who was a poll watcher during the election cycle, which I think is most of the population these days.
16:23 Because of the bots, you couldn't escape it.
16:25 Yeah, exactly.
16:27 Every time the polls changed, the bots wanted to talk about it.
16:29 The polls were rigged.
16:30 No, the polls show my candidate in the lead.
16:32 Horse race, horse race, horse race.
16:34 And what's interesting is that the vast majority of those polls and predictions were wrong, like dead wrong.
16:40 And so there's been this kind of like, I don't know, huge reconciling in the data science community and kind of anybody who does this predictive forecasting on a number of things.
16:53 Like there's the technical things that went wrong as in what's wrong with our models.
16:57 There's what's wrong with our data, what's wrong with the polling process and on and on, like kind of into the weeds about what technically did we fail to get right in this process.
17:06 And then there's these larger questions about why do we even do this?
17:10 Like who actually benefits from that kind of checking 538 every day and seeing like, oh, Clinton's up 78% this week and oh, Trump's up 1% this week, you know, and like and just watching that change over the course of an election.
17:23 And meanwhile, the politician, we don't hear very much about the policies of the individual politicians as voters.
17:29 Are we actually informed more?
17:31 So anyways, we actually did a whole episode about we did two, as a matter of fact.
17:35 So we did one episode with a bunch of data scientists and people that are kind of on the inside of political campaigns and understand a lot about how campaigns pull people and how they make their predictions.
17:46 We had people, Natalie Jackson, who is responsible for the Huffington Post model that was that was wrong, but so is everybody else's.
17:53 And then we actually did a second episode with Mona Chalabi, who's now the director of data journalism for the Guardian for the US Guardian.
18:01 But she used to work at 538 and has just some really, I think, really smart things to say about whether or not these polls help our public discourse, like whether or not data journalism actually should be involved in this kind of horse racing or, you know, this kind of horse race prediction.
18:15 So and how much like human bias and interpreting the output of these models really impacted how they were published and how they were communicated to the public, because there's this weird thing happening based on the outcome of this election where data science is saying, hey, we made this huge mistake.
18:30 And people are starting to fall back on their previous positions that I think felt comfortable for everybody, which was, well, I don't know how that thing works.
18:38 I don't really trust the data.
18:39 I'm going to go with how I feel.
18:41 Here's an example where all of you guys thought you were right and you were clearly wrong, which reinforces my initial position that data is fallible.
18:47 Therefore, I'm not going to trust it.
18:49 And I think that's it might do a real harm to not just to data journalism, but to potentially society in general.
18:57 That lack of faith in data is, I think, misplaced.
19:01 There's a really there should be a lack of faith in the way that humans interpret the output of their models.
19:07 Like, ultimately, there was a lot of human bias that was injected into that process that was pretty clearly important.
19:12 Yeah, absolutely.
19:13 And anyway, so obviously, I have a lot to say about this.
19:15 I'm like, I'm like rambling on and on and on.
19:17 But it's a huge story.
19:18 Well, that's awesome.
19:20 I'm really looking forward to checking out those episodes.
19:22 And of course, I'll link them in the show notes.
19:23 There's a couple of things that come to mind when I hear what you're saying.
19:27 One is it feels a little bit like we've gotten ourself into this tight reinforcing loop that's super hard to get out of.
19:33 And it's a little bit like quantum mechanics, where if you observe a thing, you change it.
19:39 You know what I mean?
19:40 Like, you try to measure the polls, but you tell people the polls so often.
19:43 And then the various news groups and whatnot are like reinforcing their angle, but also the polls and opinion.
19:49 And it's just like, is that the opinion or is that people's reaction to people's perceived opinion?
19:54 I mean, it's just where do you detangle these things?
19:57 Yeah, that's a really good point.
19:58 Because in the media, we end up with these kind of meta narratives about the election, like Trump's a ridiculous candidate and Hillary Clinton is the inevitable president.
20:07 We're just kind of waiting this thing out.
20:08 I mean, that's an oversimplification.
20:10 But in a lot of the media, that's what it was.
20:11 And so I think it's interesting to see even people that are supposed to be objective, like journalists are always supposed to be objective, but data journalists especially, like believe in the numbers.
20:20 That's the, you know, that should be the mantra.
20:22 And yet I think every time that their data or their analysis or their technique showed that actually Trump might have a pretty decent chance of winning if you account for some of the inconsistencies in the polling, I think they all went, oh, that actually indicates a problem with my model.
20:37 I should tweak it to make sure that it gives the results that are more correct.
20:41 And whoops.
20:42 It turns out that maybe that wasn't the right conclusion to draw when the models didn't perform as we were expecting.
20:50 So it's been really fascinating.
20:52 Well, fascinating.
20:54 And perhaps, I don't know, it'll be interesting to see whether or not we continue to engage in this kind of like this kind of entertainment or, you know, I guess that's basically what it is, you know, like watching the score change from quarter to quarter.
21:05 Exactly.
21:06 Yeah.
21:07 I think, you know, on one hand, you could make statements about humans and whether or not they'll just start to adjust.
21:13 But there's such a commercial interest in the news.
21:17 I'm thinking of like cable news, especially to just like continually cover this.
21:21 So I'm not encouraged that we'll stop.
21:25 That's true.
21:26 Given that, maybe we should just get better at it before the next election.
21:29 Exactly.
21:30 Exactly.
21:31 This portion of Talk Python to me has been brought to you by Rollbar.
21:48 One of the frustrating things about being a developer is dealing with errors, relying on users to report errors, digging through log files, trying to debug issues, or a million alerts just flooding your inbox and ruining your day.
22:00 With Rollbar's full stack error monitoring, you'll get the context, insights, and control that you need to find and fix bugs faster.
22:07 It's easy to install.
22:08 You can start tracking production errors and deployments in eight minutes or even less.
22:13 Rollbar works with all the major languages and frameworks, including the Python ones, such as Django, Flask, Pyramid, as well as Ruby, JavaScript, Node, iOS, and Android.
22:22 You could integrate Rollbar into your existing workflow, send error alerts to Slack or HipChat, or even automatically create issues in Jira, Pivotal Tracker, and a whole bunch more.
22:31 Rollbar has put together a special offer for Talk Python to me listeners.
22:35 Visit Rollbar.com slash Talk Python to me, sign up, and get the bootstrap plan free for 90 days.
22:40 That's 300,000 errors tracked all for free.
22:43 But hey, just between you and me, I really hope you don't encounter that many errors.
22:47 Loved by developers at awesome companies like Heroku, Twilio, Kayak, Instacart, Zendesk, Twitch, and more.
22:53 Give Rollbar a try today.
22:55 Go to Rollbar.com slash Talk Python to me.
22:57 Another big theme this year and last year, but especially this year, has been encryption, right?
23:11 Yeah, absolutely.
23:11 Maybe especially based on the election outcome.
23:13 It'll become more of a concern.
23:14 Actually, people going to places like ProtonMail.
23:18 ProtonMail is awesome.
23:18 It's like a super encrypted PGP type thing out of the guys at CERN.
23:22 But it's kind of like Gmail, but with PGP and Switzerland.
23:26 Anyway, things like that have been going up.
23:28 Or like Signal, right?
23:29 Like encrypted messaging.
23:30 Yeah, for sure.
23:31 And as well as the whole iPhone, Apple thing at the beginning of the year.
23:35 I think that was 2016, right?
23:37 Should they unlock it?
23:38 Should they be made to unlock it?
23:40 And so on.
23:40 Oh, yeah.
23:41 After that guy.
23:42 Sam Bernardino.
23:43 Yeah, that's right.
23:44 Yeah, yeah.
23:44 So Google decided to take this idea of encryption and use it for a really interesting AI experiment.
23:53 So this is from the Google Brain team.
23:55 And their slogan is, make machines intelligent, improve people's lives.
23:59 So what they did is they started with three basic vanilla networks.
24:03 Okay?
24:03 And they named one Alice, one Bob, and one Eve.
24:08 And you know where this is going, right?
24:09 And all they did, all they did was said, Alice, the outcome they wanted to measure was, Alice
24:15 has to send a secure message to Bob.
24:17 Bob has to decrypt it.
24:18 And Eve has to try to break the decryption.
24:21 And the only advantage Alice and Bob have is they have a shared key that they can use for
24:27 encrypting.
24:28 But they didn't even tell the thing that you need to encrypt stuff or mention any sort of
24:33 algorithm.
24:33 All they did was give it a loss function that said, if Eve decrypts this, you lose.
24:38 If Eve does not decrypt this, you win.
24:40 And they just let it run.
24:42 Yeah.
24:45 It's kind of cool.
24:46 Kind of cool.
24:46 Yeah.
24:47 But I wonder how they want to know.
24:47 Yeah.
24:47 So what they did is they did 25,000 trials of it.
24:50 You know, how they go through a bunch of iterations each time to like train it up and teach it,
24:56 you know, and let it try to basically invent encryption.
24:59 Okay.
25:00 So they said what they did is they more or less created this adversarial generative network.
25:04 Okay.
25:05 And then they just let it go.
25:07 So the data that the networks were given.
25:09 So Alice was given the key in plain text as the first layer of the neural network.
25:13 Bob was given the key in cipher text as the input.
25:17 And Eve only got the cipher text.
25:19 So what do you think happened?
25:20 Well, I know what happened.
25:23 So I won't ruin the punchline, but it is a really fascinating result watching artificial
25:28 intelligence battle each other.
25:30 So let me ask a different question because, yeah, you do know it.
25:32 Would you have expected this outcome?
25:35 You know, for a problem that implies a kind of creativity, I wouldn't have actually.
25:41 I'm not sure that I would have expected.
25:43 I mean, maybe some version of this, but the outcome is what the AI that ultimately succeeded
25:50 in encrypting its messaging actually, like the way in which it solved that problem, I think
25:55 was, was what was actually startling.
25:56 Yeah.
25:56 It's very startling because it, they were not told about encryption.
25:59 They were just told Eve is not supposed to know this.
26:01 And I mean, it's not like a, it's not really smart, right?
26:04 It's an, it's a, an artificial intelligence.
26:06 It doesn't know hardly anything.
26:08 And so the results, most of the time, Alice and Bob did manage to evolve a system that
26:13 they could communicate with few errors and only a couple of times, well, relatively small
26:19 percent.
26:20 Let's say it looks like three, 4%.
26:23 I don't know if you factor that out, area under the curve and whatnot.
26:26 But a few percent of the time Eve showed an improvement over random guessing and actually
26:32 figured it out.
26:33 But soon as Eve did, Alice and Bob just ramped it up and like crushed, crushed her.
26:38 She couldn't keep up anymore.
26:40 So it's funny.
26:41 It goes for like 6,000 iterations where Eve is basically losing and Alice and Bob are winning,
26:46 but then it switches.
26:48 And Eve kind of figures out for a minute and then they're just like, nope, we're going
26:52 to change this.
26:52 And then you're done.
26:53 And what's interesting is that when Alice and Bob were ultimately successful, it's not as if
27:00 they chose from like a buffet of kind of cryptology techniques or, you know, techniques for encryption
27:08 or whatever, and then ultimately stumbled upon the one that was the most secure.
27:12 They invented a new way to go about encrypting their messages.
27:17 They invented a new kind of encryption in order to accomplish this goal, which is, that's
27:22 to me the part that is startling, that they actually created something new.
27:26 And of course, you know, the jokes on the internet were abound.
27:29 Like, you know, basically two AIs have figured out how to talk to each other in a way that nobody
27:34 else can understand.
27:34 There's no problem here.
27:36 Yeah, of course.
27:37 There's no problem here.
27:38 This will be fine.
27:40 Wait, what's that thing in the sky?
27:41 Yeah, so.
27:42 Yeah.
27:46 So this is how the Terminator starts.
27:48 No, I don't actually think that.
27:49 But I think this is super interesting.
27:51 And I really do think the creativity aspect is what's so amazing.
27:54 And I wonder if you really, I mean, we're talking about other encryption techniques, you know,
27:58 PTP, ProtonMail, Signal, and so on.
28:01 What if you really wanted to communicate secretly?
28:04 You just get some super trained up AIs and with you and whoever you're trying to communicate
28:09 with.
28:09 And you just use whatever that thing does, right?
28:11 Like, it's unknown.
28:12 You don't even know what it does.
28:14 Yeah, well, I mean, and to be totally frank with the audience, I think when it comes to
28:17 this type, these types of deep learning techniques, like nobody knows what they do anyway.
28:20 I mean, we know what they do mechanically, but nobody's quite sure.
28:24 Nobody's proven why they're able to be as effective as they are.
28:27 So we're kind of already in that territory where we're inventing things that are more complex
28:31 than our brains can model or understand.
28:34 Okay.
28:35 And when you have those things that can generate themselves, I don't know, it's kind of interesting
28:41 to imagine this future world where we don't actually rely on an encryption technique that
28:46 we understand.
28:46 We just have some AI that we think are smarter than everybody else's, and we just let them
28:51 encrypt it however they see fit, pass the message.
28:53 And then ultimately, any adversaries will be developing intelligence to try and break our
28:58 encryption.
28:58 And they'll just be kind of fighting it out in a world that we don't really understand.
29:02 And hopefully our messages are, you know, secure.
29:04 Did you just read from the back of like a William Gibson novel or?
29:08 No, I'm just kidding.
29:08 Doesn't it?
29:09 Right?
29:09 Right?
29:10 I mean, it does.
29:11 It sounds like we're, at least in some kind of, some of those like, those kind of seminal
29:15 like 80s and 90s sci-fi authors, like this kind of far future that they predicted, at
29:20 least certain aspects of it are starting to become a reality.
29:23 The smarter that are, the more that algorithms can teach themselves.
29:26 Yeah.
29:26 It's super cool.
29:27 I think it's, it's an uncertain future, but it's very interesting.
29:32 It's very interesting.
29:32 So the next item is actually about deep learning as well, right?
29:35 Yeah.
29:35 Yeah.
29:35 I think just to continue on the conversation about deep learning, this was really the year
29:40 that I think it came into its own.
29:42 I feel like to give a quick overview for people who aren't familiar with either machine learning
29:48 in general or this particular technique, basically it's a neural network and a neural network
29:53 is kind of like a, well, let's not really worry about what it is.
29:56 In theory, you're trying to, there's like neurons in a neural network and you kind of
30:01 find a path through the neurons that allows your model to make a decision kind of in the
30:05 same way that your brain works.
30:06 Like you kind of light up a sequence of neurons in a very complicated pattern.
30:10 And that sequence ultimately represents some kind of unique outcome.
30:14 And in this case, it might be like, I don't know, tell me whether that person in the photograph
30:18 is wearing a red t-shirt or a blue t-shirt, or tell me whether it's a man or a woman.
30:22 And learning the kind of subtle patterns in the image that allow you to make that determination
30:28 are the kind of lighting up of some sequence of neurons in a neural network.
30:33 And deep learning is basically when you have many, many, many, many, many, many layers
30:37 of your neural network.
30:38 So much so that it's kind of difficult to understand what's happening in the middle.
30:43 Like there's an input layer.
30:44 We kind of know what goes into the neural network.
30:46 There's an output layer where they tell us what happened.
30:48 And then whatever happens in the middle, we kind of speculate about and make charts and
30:53 kind of infer.
30:54 Yeah, it feels a lot like the MRI sort of analysis.
30:57 Well, creativity happens in this part of the brain.
31:00 And when you're thinking about math, this part of the brain lights up.
31:03 But like, that's the extent of our understanding to a lot of these, right?
31:06 And this sounds a little like that.
31:08 Yeah, yeah, it's exactly like that.
31:09 But the gains from adopting these techniques have been really, really exciting.
31:14 And I think over the next five years, we'll start to see how these technologies impact
31:19 the products that we use.
31:20 For the most part, I think the gains have been largely academic.
31:24 There haven't been a lot of consumer applications.
31:26 But the kind of things that neural networks have been tried or deep learning has been tried
31:30 on, like a guy used a neural network to, it consumed all of the text from the first seven
31:36 Harry Potter novels.
31:37 And then it tried to write new ones.
31:39 They were not good.
31:40 They were quite bad, actually.
31:41 But they were kind of hysterical.
31:43 And then but plausible, like the language that the model used in order to generate these
31:47 new novels was like structurally correct.
31:50 Even if it didn't make any sense, if you know anything about the books.
31:52 Yeah, that's really interesting.
31:53 You know, I would love to see a slight variation on that.
31:56 If you could abstract away a little bit more and not go straight down to the text, but just
32:01 to the plot building blocks.
32:03 There's Harry and there's Hermione.
32:05 And Harry has this feeling.
32:08 He did these actions.
32:09 And then just go, OK, reorder that and have a writer put actual meaningful words to that
32:14 like outcome.
32:15 That would be cool.
32:16 That would be super cool.
32:16 Yeah.
32:17 Because I think a lot of the what these networks are still losing is like this idea of kind
32:23 of context.
32:24 Yeah.
32:24 Like like Google did a similar thing where they fed a neural network a bunch of romance
32:29 novels.
32:29 Although to its credit, it produced some poetry and the poetry read like a real poem, like the
32:35 kind of thing that the romantically inclined among us might have written in high school.
32:40 Kind of sappy, a little saccharine, sometimes unnecessarily dark.
32:45 But yeah, you know, it's super, super interesting.
32:47 But yeah, but it did.
32:48 It's that it that does seem like the next evolution of it.
32:51 Like we've we're kind of understanding language at a really fundamental level.
32:54 But then how that how you we kind of build on how we use the building blocks inside language
33:00 to form like larger concepts and ideas that maybe map over the course of hundreds of pages
33:06 because they're that complex.
33:07 That fortunately still seems to have escaped deep learning models.
33:11 But when they figure that out, just imagine like we talked about all this election stuff.
33:15 Could you imagine like a neural network crafting the story of an election over and then deploying
33:20 thousands of bots communicating with each other in an encryption that we can't understand?
33:23 Like that's when it happens, man.
33:25 Yeah, it's all coming together.
33:26 It's all coming together.
33:27 I hope it's a benevolent AI.
33:31 OK.
33:31 Yeah, but but but there's not all it's not all it's not all potential benevolence and doom, right?
33:35 There's actually some really exciting applications of data science, for example.
33:38 Yeah.
33:39 So, for example, the next thing I want to talk about is actually data sciences, data scientists,
33:44 mathematicians, programmers doing good for the world.
33:48 So one of the big challenges for humans still remains to be cancer, right?
33:54 And one of the more common types is breast cancer.
33:58 So there's this group that put together something called a dream challenge, the digital mammography
34:03 dream challenge.
34:04 All right.
34:05 So the idea is the current state of the world is out of every thousand women screened, only
34:11 five will actually have breast cancer, but 100 will be called back for further testing.
34:15 And so it's not just, well, it's like another doctor visit.
34:19 It's like you're told, hey, we found something in your scan.
34:22 You need to come back.
34:23 So there's all the concern and worry.
34:24 You probably come back a week later.
34:26 There's maybe a biopsy.
34:27 Like you wait for the results.
34:28 It's it's like really disrupting.
34:30 Right.
34:30 And expensive.
34:31 So this group, a bunch of different groups came together and they're putting out a million
34:37 dollar prize for anybody who can build a model that improves upon this and does better than
34:43 the other people trying to do the same.
34:45 So what I think is really interesting is the data and how you get access to the data.
34:51 So fundamentally, what you'll do is you'll submit some sort of artificial intelligence machine
34:55 learning type thing to process this data.
34:59 And if if you can say, here's a bunch of images of scans and, you know, traditionally,
35:06 there's been a certain amount of data available, but this is actually taken to an entirely new
35:11 level.
35:12 So you take this, this data, these scans, and you look at the pictures and you have
35:16 to say, no, this actually is not cancer.
35:18 Yes, this is cancer.
35:19 And then they have the actual outcomes verified by biopsies.
35:24 So you're given that as an input, but here's the deal.
35:27 Normally the problem with doing medical research is you've got to anonymize the data.
35:33 You've got to get permission to share the data and so on.
35:35 So they don't share the data with you.
35:37 Right.
35:38 So the question is, how do you actually process this?
35:41 How do you teach them or seeing anything?
35:42 Right.
35:43 Well, what they do is they give you like 500 pictures or something like that.
35:47 So you can test.
35:48 Right.
35:48 And they give you the outcomes.
35:49 This one was cancer.
35:50 This one wasn't cancer.
35:51 So you could kind of get it sort of working.
35:52 And then they've set up this mechanism in the cloud and AWS using Docker.
35:58 So what you do is you build your model into a Docker image using TensorFlow and a bunch of
36:05 different capabilities that are available to you.
36:08 You build your untrained model into a Docker image.
36:12 You submit the Docker image to some cloud computing system running AWS and they train it on actual
36:19 data and they teach it.
36:20 Yes, this was cancer.
36:21 No, that was cancer.
36:22 Here's your prediction.
36:22 Right, wrong.
36:23 And so on.
36:24 But you have no internet access.
36:26 You can get like the logs or you can't actually ever see the data.
36:29 And then they submit your trained on the real data that you never get to see to actually a huge amount of data, which they can use because nobody ever actually has access to it.
36:40 So there's about 20 terabytes of 650,000 images, 640,000 images that you're going to run your model against to predict cancer.
36:50 And then you'll be judged on your work against that.
36:53 I find that really fascinating.
36:54 So this idea that you basically build a model on your own, like just kind of speculate on what will or wouldn't work and then hand it over to be trained and tested on data that you never see.
37:08 And then you just kind of know whether or not it worked.
37:09 And then I guess tweak accordingly.
37:12 I mean, it's a really awkward process, but at the same time, it's also a really novel solution to I think anybody who's ever worked with or been close to working with medical data.
37:21 There's a lot.
37:23 There's a huge need for this kind of work.
37:25 But most of the people who do machine learning research don't have access to the data because they're not employed by the medical institution that has ownership of it and sort of has been given permission to use it and access it as they see fit.
37:39 And so you almost always run into a wall right around that point in the conversation where it's like, OK, cool.
37:46 We'll just, you know, give us as much data as you have.
37:48 We'll go play around and we'll make a model and then we'll tell you how it goes and then we'll come together and blah, blah, blah.
37:53 Like that's kind of a normal data science model building process where you say, give me whatever data you can and then we'll use that to figure it out.
38:00 And so to come up with this technique, this kind of like double blind or triple blind or ultimately this kind of like, you know, blind trust, I guess, for using training and then using a model is kind of a novel solution.
38:15 I think even if it's awkward, it's like it's a good first step to just get this kind of thing on the road.
38:20 Right.
38:50 one of N problems.
38:51 I think there's there's so many interesting applications given that deep learning can now detect these really subtle patterns, these little really subtle distinctions from one image to the next.
39:01 much better than a human being could.
39:02 Yeah, absolutely.
39:03 I think is it just has a ton of potential.
39:04 So I'm glad that even if it's a little bit awkward that they're just pushing this forward, like let's just make it happen.
39:08 However, we however we can do it legally.
39:10 Right.
39:10 It absolutely is working within the bounds of, you know, the privacy guidelines and so on.
39:15 But it's, it's really interesting.
39:16 And this is a framework.
39:18 I believe this group is building out for future dream challenges, not just this one, right?
39:22 This is like the first of many of these types of things.
39:26 Let me take just a moment and tell you about a new sponsor of the show.
39:29 This portion of Talk Python is brought to you by AnacondaCon.
39:32 AnacondaCon 2017 is the inaugural conference for Anaconda users, as well as foundational contributors and thought leaders in the open data science movement.
39:40 AnacondaCon brings together innovators in the enterprise open source community for educational, informative, and thought-provoking sessions to ensure attendees walk away with knowledge and connections they need to move their open data science initiatives forward.
39:53 AnacondaCon will take place February 7th to 9th, 2017 in Austin, Texas.
39:59 Attendees can expect to hear how customers and peers are using the Anaconda platform to supercharge the business impact of their data science work.
40:07 In addition, attendees will have the opportunity to network with their peers in the open data science movement.
40:12 To learn more, register for the event, or even sponsorship inquiries, please visit talkpython.fm/acon.
40:19 That's talkpython.fm/acon, acon.
40:23 So the other interesting thing about this is the hardware that you get to use.
40:28 Because if you're going to process 20 terabytes of images and then apply machine learning to each one, that's going to be non-trivial, right?
40:35 So they give you some hardware to work on.
40:37 And in fact, your Docker image gets to run on servers powered by NVIDIA, Tesla, K80, GPUs.
40:44 Which I think GPUs in machine learning is really interesting already.
40:47 But just to give you some stats here, your machine gets to run on a server with 24 cores, one of these GPUs, and 200 gigs of RAM.
40:57 And the GPUs are insane.
40:59 Like they have almost 5,000 CUDA cores, 28 gigabytes of memory with 480 gigabytes per second transfer rate, and 8.7 teraflops of single precision computation power.
41:14 Yeah, the stats on – they're just – it's mind-blowing.
41:17 It's mind-blowing.
41:17 And because I think that's something that sometimes gets lost in the discussion about deep learning is like the amount of calculations that take place in a deep learning, deep neural network are truly mind-boggling.
41:28 I mean, training your kind of typical machine learning model might take somewhere between minutes to hours if it's complex or being trained on a lot of data.
41:37 Deep learning models take days or weeks to train or months if you're doing it at like Google scale.
41:43 I mean, the computations just take forever.
41:47 And the reduction in computation time running on the GPU is phenomenal, like many orders of magnitude faster.
41:53 And so the increasingly powerful hardware is really, I think, the untold story of how much it's accelerating the capacity of this type of machine learning.
42:02 Yeah, absolutely.
42:03 And I suspect these types of things where there's a million-dollar prize and the hardware to actually take a shot at it is quite interesting.
42:11 Yeah, and it's expensive.
42:12 I think that like at the moment, the most – this like high-end hardware that we're talking about, you know, it'd cost you $1,000 a day to run a single instance on AWS.
42:22 But that's only going to come down.
42:23 And just like we saw before with the kind of revolution of service-oriented architectures or kind of microservices where it was like kind of the idea of being like, ah, screw it, spin up a new instance.
42:32 That's right.
42:33 And like we lived in a world where we would spin up and kill instances all the time and never think about it for much more sophisticated and scalable and complex applications that lived on the web.
42:43 It's only a matter of time before we have the same kind of mentality with these highly performant instances that are backed by GPUs.
42:53 And I think that that'll – we're only just at the very beginning of that story, I think.
42:57 Yeah, I totally agree.
42:58 I think it's amazing.
42:59 Like in 10 years, we'll be doing this on our watch.
43:00 But speaking of things you don't want on your watch, Microsoft made a bot, and I don't want anywhere near my watch.
43:07 Yeah, I'm not sure I want Microsoft's bot anywhere near my watch or my child.
43:13 What are you saying the bot's a bad influence?
43:15 I think it was a bad influence on all of us, on humanity perhaps.
43:19 Actually, but the funny thing is that it was more like humanity was a bad influence on the bot.
43:22 So we're talking about Tay, of course, Microsoft's Tay.
43:25 So for those who missed this story, it was kind of a brief moment unless you're like a kind of hyperactive media consumer like I am.
43:34 So Microsoft developed a chatbot and released it on Twitter.
43:37 And the way that it worked is that the chatbot Tay would learn how to communicate based on how people communicated with it.
43:45 So you could talk to Tay on Twitter and then Tay would kind of learn how it should respond given the context of what you asked.
43:53 And it should learn how to construct language in a way that was consistent with the norms of this particular channel, which was Twitter.
44:00 And it did a remarkably good job at that.
44:02 When it responded to people, it largely responded in a way that made sense given what they asked.
44:06 And it largely responded in a way that felt like a tweet.
44:10 You know, like it started using weird abbreviations.
44:12 It would say like, you know, see you, the letter C and the letter U to mean see you later.
44:16 You know, things like that.
44:17 And so in a lot of ways, it was a remarkable accomplishment.
44:20 And I should point out that when Microsoft tested the same thing with a Japanese audience, the bot learned to be a sort of genial participant in normal conversation.
44:30 But when they released the bot in English to a largely American audience, it learned very quickly to be a horrible racist.
44:41 Oh, God.
44:42 It was like, it's funny, but not, I mean, it was funny at the time, a little bit less funny now that we know more about like the alt-right.
44:48 But at the time, basically, you know, the kind of Reddit 4chan crowd thought it would be funny as a prank to teach Tay that the way that human beings communicated with each other was to talk about like whenever it was asked, Tay was asked about what it thought about Mexicans.
45:04 It would respond and say, Mexico is going to build the wall and it's going to pay for it.
45:07 Or, you know, it would ask what its thoughts were about Jewish people and it would like apologize for the Holocaust.
45:13 Like truly, truly offensive.
45:15 That is offensive.
45:16 Wow.
45:17 Like breathtakingly offensive.
45:19 And that's only kind of, I guess, is it even funny?
45:23 I mean, there's an aspect of like the scale of the prank that's kind of funny or that like making a big corporation look stupid.
45:29 Like I can see how it's funny in like a juvenile way.
45:31 Anyway, it was just really interesting commentary on both like the sophistication of these technologies.
45:36 Like anybody who's done any kind of natural language stuff knows that like has experienced, I think, how challenging it is to work with the language that people publish on Twitter because it's not really normal language.
45:47 Like there's like a Twitter speak that's unique just to this weird little niche corner of the Internet.
45:52 I guess it's kind of a big corner of the Internet.
45:53 But you know what I mean?
45:54 People speak differently on Twitter than they do anywhere else.
45:57 And so for a machine to learn that is really cool.
46:00 At the same time, it does speak a little bit to Internet culture that the first thing that people decided to do instead of like, like, again, like a Japanese audience, they treated it like kind of like a pet, like a fun friend.
46:10 And, of course, it was immediately exploited to be kind of a horrible racist, misogynist, you know, like a Gamergate participant basically.
46:18 I think it's really cool how well it did.
46:21 But I think it's unfortunate that it was turned to evil.
46:24 Oh, well.
46:26 Yeah.
46:28 So that was Tay.
46:29 That was Tay.
46:29 Go check it out if you're a machine learning researcher and a language computational linguist.
46:34 Fascinating case study.
46:35 And then also if you're interested in Internet culture and have a strong stomach, it's good for that, too.
46:40 Just remember, it's a bot.
46:43 It was made to be evil by the people.
46:45 It wasn't designed that way.
46:47 OK, that's a good point.
46:47 And it's not like it learned to be evil.
46:49 This is the only like people, of course, made jokes like, of course, you know, you release an AI on the Internet.
46:54 And, of course, you know, within like four hours, it's a Nazi.
46:57 And you're like, this is not bode well for the future of artificial intelligence.
47:00 That's not really what's happening.
47:02 It's not like bots want to kill all human beings.
47:05 The AIs are not coming for us.
47:07 Not yet.
47:08 Not yet.
47:09 But when they do, maybe we can turn them to our will.
47:12 OK, so the next one has nothing to do with bots.
47:15 In fact, this is a academics intersect open source intersect business stories.
47:21 So William Stein is the guy that created this thing called Sage Math and Sage Math.
47:26 Do you know Sage Math?
47:27 I don't actually.
47:28 Yeah, I was kind of surprised when I saw this.
47:30 I'm interested to hear more about this.
47:31 Sage Math is a really interesting project.
47:33 It's direct competitor to MATLAB, Mathematica, MAGMA, some of these large commercial computational science sort of platforms that are not open.
47:45 Right.
47:46 Like you if you want to do machine learning on MATLAB, you probably got to buy some pack that's like two thousand dollars for every person who uses it and so on.
47:53 Right.
47:54 So it's really hard to share your work because you've got to have that like that extension back.
47:57 So this guy, he's came out of Harvard, got his PhD there, I believe, and was at UCSD where he actually decided like everything I do in my computing life is open source, except for the one thing that I care most about where I do my math research, which is closed source.
48:15 So that's it.
48:16 I'm going to make make a competitor.
48:18 And so fast forward 10 years or something like that.
48:21 We have Sage Math, which is a really serious competitor to things like MATLAB and Mathematica.
48:26 There's some interesting stuff that came out of it, like Cython, the compiled high performance version of Python came out of that project in some interesting ways.
48:35 So this year he announced that he's decided that to run a successful open source project of that scale, doing that in an academic setting doesn't make sense if he really wants that to succeed.
48:49 So he would say, you know, look, he built a great bunch of people at the University of Washington, where he is these days.
48:55 But he would train these people to become great programmers and work on this project.
48:58 And then they would be hired immediately by Google or some other places that, oh, you know, data science, you know, this computational stuff.
49:05 We got a spot for you.
49:07 And they would be off.
49:08 So he decided to leave academia, leave his senior track job and start a company called Sage Math Cloud, which is like a cloud hosted version.
49:16 You can do all sorts of like data science stuff there, run like Python notebooks, the whole Python scientific stack are and sort of share this across your classes.
49:26 And I just think it's interesting to see this high profile professor leaving academics to venture out in the world to start a business based on open source.
49:37 Yeah, I think that that's actually an interesting trend across the machine learning community, where sort of prior to this AI spring or whatever we're calling it, where pretty much everybody wants some kind of the need for machine learning and machine learning expertise is really high.
49:54 This kind of work did come out of academia and the research labs associated with computer science departments at universities was where we expected a lot of this to come from.
50:03 But now most of the large institutions, Microsoft Research, Google Research, IBM, some of the most of the really huge technology companies are effectively doing pure research.
50:15 And so but pure research, not at academic salaries.
50:19 So, you know, you've earned your PhD, maybe then a couple of years of teaching at machine learning at a high profile university, which it's kind of tough to turn down a couple hundred thousand dollars, $250,000 a year to go work with, you know, with huge resources at your disposal with some of the smartest people in the world.
50:36 And universities are aware of this.
50:38 I think that a lot of a lot of universities are really trying to rethink their relationship with their professors for just this reason, because they don't want to lose them completely to the private sector, but at the same time recognize that they'll never have the resources of the private sector.
50:52 So maybe and you're seeing more people start to take a year off kind of ping back and forth between some of these research institutions and a university.
50:59 It's kind of a new world on the and I'm not sure that there is anything wrong with it.
51:03 I mean, as a as somebody who benefits a lot from this research, I think the way that the private sector is furthering this industry is really exciting.
51:13 Actually, there's a lot of great things that are coming out.
51:15 I see this as a positive news item.
51:17 I'm super excited for William.
51:18 I hope he succeeds in doing this because I think it's really great.
51:21 I love to see open source projects that are become sustainable businesses or have sustainable businesses making the core project better.
51:28 If you want to learn about SageMath on episode 59, I actually interviewed William there.
51:33 Another, you know, on episode 81 of Talk Python, I interviewed Jake Vanderplass and he's at University of Washington as well, I believe.
51:41 But there's no relationship between these two stories other than there.
51:45 They've started this thing called the eScience Institute, which seems to be like a good balance of like maybe a modernization of people doing sort of industry stuff, but also academic computational stuff.
51:57 I think if this story, if the story of people leaving academics to go do private stuff was told in the 90s, it might be a big negative, right?
52:06 This guy went and started this private company where his smarts are bundled up in this commercial thing and hidden under IP.
52:12 But there's so much open source that is coming out of this, even in the private space, although there's some kind of commercial component to it.
52:20 You know, a lot of the stuff like SageMath, for example, is open source.
52:24 So it's not like it's being lost to the world because it's going behind some corporate wall.
52:29 Yeah, I think that that's a really good point.
52:30 Like, I think that's true.
52:32 This is mostly an open source story.
52:34 Anaconda, which is now huge in the Python community, is built by a company called Continuum Analytics here in Austin.
52:39 TensorFlow, which has now become sort of the de facto platform for building neural networks and deep learning models, came out of Google.
52:48 And on and on and on.
52:49 Like, I think – and SageMath is another great example of that.
52:52 And it's cool to see focused on an area of research that is not necessarily computer science-y, you know?
53:00 Yeah.
53:00 Like, actually focusing on the kind of pure math aspects of it is a really valuable contribution.
53:07 So I agree.
53:08 I think it's kind of a cool trajectory.
53:11 And I hope that the technology industry continues its commitment to open source because it really – I mean, not to sound hokey, but it really does benefit the world in a serious way.
53:19 It's definitely a better place to be.
53:20 I totally agree.
53:21 So I'll give them a quick plug.
53:23 Hopefully, you know, if you're a teacher or a professor out there, check out cloud.sagemath.com.
53:27 There's a lot of cool stuff you can do for your classes and so on.
53:30 All right.
53:31 So AIs are smart.
53:33 They can do lots of things.
53:34 But there's just games they're never going to be able to solve like Go, right?
53:37 Well, one would think.
53:38 One would think.
53:40 But actually, it's cool.
53:41 We've talked a little bit about AIs being creative and kind of deep learning models actually coming up with kind of innovative approaches to solving a problem.
53:48 And I think that that's been a big story this year.
53:51 So, you know, we're kind of comfortable with the idea that machines beat us at games like chess, which as human beings we think are remarkably complex.
53:58 There's so much strategy.
53:59 There's so many potential moves.
54:00 And that's true.
54:02 I think a human being can really only hold – like grandmasters at chess can hold maybe eight permutations of the board in their head at any given time.
54:11 Like they can see kind of eight moves ahead, what they'll do, what their opponent will do, what they'll do.
54:14 And they can kind of keep that changing picture of the board in their head.
54:17 Of course, a computer has no such limitations.
54:19 They can play out almost – especially with the computational power that we have now.
54:23 They can play out endless strategies and endless permutations and find the one that gives them the most likely chance of winning.
54:28 We saw Watson basically kick everybody's butt at Jeopardy.
54:33 You know, consume all the trivia knowledge of the universe, understand language, and figure out how to beat us at that game.
54:40 I think that's super interesting because it's the natural language component of it.
54:44 It's not like clear rules.
54:45 This thing is on this square.
54:46 It could move to those three squares.
54:48 Yeah, yeah, absolutely.
54:49 And being able to connect what was being asked in the question to the kind of like deep graph of knowledge that Watson has at its disposal, right?
54:56 Kind of understanding the relationships between different contexts and blah, blah, blah, blah, blah.
55:00 Very cool.
55:01 The game that people thought probably would be inaccessible to computer or machines for kind of a long time is something called Go.
55:09 And Go is kind of, for those who aren't familiar with it, and I'm not, like I'm not a Go player, so I might be getting this wrong.
55:15 But it's basically like chess times 100, like really, really, really complicated chess because there's so many different ways that the board can change.
55:25 Like there's so many different strategies.
55:26 There's just, it's a lot more complex.
55:28 The rules of the game are more complex and the possible outcomes of the game are more complex.
55:34 And because it's so complex, it relies a little bit more on like, sure, some knowledge, but like strategy and intuition because it's a little bit difficult to understand the consequences of your move, like 10 moves down the line.
55:48 So you kind of got to feel your way through it based on your expertise a little bit more than you can with a game like chess that you can pretty much keep all in your head at one time.
55:55 And because of that, people thought, well, that's kind of a tough road to sell for an artificial intelligence.
56:01 But not apparently because in March of this year, the world Go champion, Lee Sedal, basically had his butt handed to him by DeepMind, which was developed by Google.
56:13 So it's a deep learning model.
56:15 It understood how to play the game of Go in a five-game match.
56:18 It cleaned the floor with him.
56:19 And not only that, but it used like some really unorthodox techniques.
56:24 Like it was basically an exceptionally creative and exceptionally intuitive player of the game of Go.
56:30 So it was kind of like the last stand for human beings in terms of beating computers at games.
56:36 And it wasn't really much of a contest.
56:38 Yeah.
56:38 So we don't want to go up against AI anymore, do we?
56:42 I think it's really interesting.
56:45 And again, I think the creative aspect of it is what's cool, right?
56:49 The intuition, right?
56:50 Those are the things we think that computers can't do that.
56:52 Sure, they can map the entire problem space.
56:55 And if they're fast enough, they could actually map out potential ways which you could not win if they followed this series of 100 steps or whatever.
57:03 But if you're not doing that, then this gets even more interesting.
57:06 Yeah.
57:06 Yeah, absolutely.
57:07 And just to kind of understand the difference in techniques, like anybody who's done maybe some kind of basic kind of first steps in machine learning, like a popular example is to have people try and beat a game of tic-tac-toe.
57:19 Like to have your – to successfully win a tic-tac-toe using an artificial intelligence or to write a program that can do that.
57:26 And the strategy is basically learn every possible outcome of the game and then at any given moment pick the path of all possible paths that gives you the best chance of victory.
57:36 That's fairly straightforward.
57:37 But on a tic-tac-toe board, the number of possible permutations of the game are really, really small.
57:43 Nevertheless, if you haven't done that before, I mean, when I first did it, I found it to be very challenging.
57:47 It's a challenging problem to go and solve.
57:49 Extrapolating that to go I think just really demonstrates the huge leaps that we've made in this field over the past maybe decade.
57:58 It's exciting.
57:59 It is.
58:00 Like what else – when released on the right problem, like what else can these models potentially figure out?
58:05 What solutions can they see that are just unavailable to us because we just don't have the computational capacity in our brains?
58:11 It's kind of exciting.
58:12 Yeah.
58:12 So what if we had way more self-driving cars and the game was to minimize traffic jams?
58:19 That would be lovely, right?
58:20 It would be.
58:20 And what if – what if – think about this, my friend.
58:23 What if the game was to simulate human existence?
58:27 What about that?
58:28 That's totally science fiction.
58:29 That's like red pill, blue pill sort of thing, right?
58:33 Well, one would think.
58:36 But given the advances over the past – if we look at the past 10 years of video games and artificial intelligence and virtual reality, one presumes, or at least Elon Musk presumes – and this is our last story – that if you extrapolate that into the not-too-distant future, surely we should be able to simulate the entirety of human existence and play it out.
58:58 And if we could do that, what's to say that we aren't the simulation of some future existence?
59:03 And given how many simulations we'd probably run, like any sufficiently advanced society might run billions of simulations, and given that, what are the odds that we're the base reality and not just one of these simulations playing out?
59:15 Pretty small, my friend.
59:16 Pretty small.
59:17 Yeah.
59:17 Therefore – That is insane.
59:19 Therefore, we're in the matrix.
59:20 We are in the matrix.
59:21 You know, on one hand, like it seems – okay, that's a really interesting thought experiment.
59:26 You know, I mean, it reminds me of when I took philosophy in college, right?
59:30 And my professor told me about Zenon's dichotomy paradox or whatever it was called, where in order to walk out of the classroom, you have to walk halfway to the door.
59:42 And then you've got to walk halfway still and halfway of that and half of that.
59:47 But that's actually an infinite series of steps.
59:49 So how will you ever walk out of the door?
59:51 I remember my mind being a little bit blown, like, how are we going to get out of there?
59:55 Like, I understand that I walk out of here.
59:58 But logically, like, how are you going to cross an infinite number of halves, right?
01:00:02 That's crazy.
01:00:02 But then, you know, of course, I went to calculus and realized, well, you're also at a rate of speed going infinitely faster.
01:00:08 So it's like a limit that approaches, well, one, no big deal.
01:00:11 And so when I hear this on one hand, I feel like Zenon's paradox.
01:00:15 Like, you can set it up so you trick yourself to go, oh, oh my gosh, you're right.
01:00:19 It's impossible.
01:00:20 This is crazy.
01:00:20 And then there's just like a moment of clarity where it just unlocks.
01:00:23 Yeah, this is actually ridiculous.
01:00:26 But I have huge respect for Elon Musk on one hand.
01:00:31 I mean, more than almost like he is like Edison times 10 or something.
01:00:35 I mean, he's amazing.
01:00:36 And I just heard that Google is now releasing like structural map, street view type stuff for places like Las Vegas, where you can like literally map out in VR towns.
01:00:51 So, you know, like put 50 years or a thousand years on that and what happens, right?
01:00:56 Yeah.
01:00:56 Yeah, absolutely.
01:00:57 I mean, and the argument that seems to resonate with me most about this is kind of like, well, if that's true, who cares?
01:01:04 Yeah.
01:01:04 You know, like it's not as if it'd be one thing as if it was like actually the matrix and we were all living in a manufactured reality to our detriment.
01:01:12 But if the idea is that like we're a simulation that's running itself, it's like, well, how is that really different than an actual reality?
01:01:18 Like, why is that any different than a quote unquote base reality?
01:01:22 Like it's, you know, that's kind of a biased definition of what reality is in the first place and on and on and on.
01:01:26 And then, okay, fine.
01:01:27 It's kind of like that.
01:01:28 The first question that the only philosophy class that I ever took when the teacher walked in, it was like in very dramatic fashion.
01:01:34 Like we all sat down and he just said, prove to me that I'm real.
01:01:37 Yeah, absolutely.
01:01:40 You know, and you're like, all right.
01:01:41 Okay, cool.
01:01:41 Dead Poets Society.
01:01:42 But, you know, it's an interesting conversation.
01:01:47 It seems more like really those advances will lead us to the point where like we might start to not care so much about whether or not we're like our surroundings are artificial, quote unquote, or like synthetic, you know, manufactured by a machine or biologically manufactured in the way that we're accustomed to today.
01:02:04 Absolutely.
01:02:05 And, you know, that's an interesting conversation to be had.
01:02:07 Maybe a more useful one than whether or not we're like our universe is one of many in the multiverse as manufactured by a computer program hundreds of thousands of years in the future.
01:02:17 But it is.
01:02:18 It's cool.
01:02:18 It's a fun thought experiment.
01:02:19 It's like one of those things where like I applaud Elon Musk for basically like posing like a sci-fi philosophy question to the world, knowing that he basically had the world as an audience.
01:02:30 And so for the next month afterwards, nerds like us were like, well, let's debate both sides.
01:02:35 Yeah.
01:02:36 Do you think he just woke up one day and said, you know what?
01:02:39 I'm going to go to this place where I'm giving a speech.
01:02:41 I'm just going to deadpan this sucker.
01:02:43 I'm just I'm going to put it out there and just pretend this is.
01:02:45 And let's just see what happens.
01:02:47 Yeah.
01:02:48 Yeah.
01:02:48 At some point, you know, at some point you have to wonder.
01:02:51 I get that, you know, like periodically if I'm like, you know, things are kind of cruising along like thing A, I'm doing is going pretty well.
01:02:57 Thing B, there's some real fires to put out.
01:02:59 Intellectually, I'm a little bit bored this week.
01:03:01 You know what I mean?
01:03:02 And of course, that goes away because, you know, whatever, we're all busy and we get consumed in our problems.
01:03:06 But when Elon Musk has that kind of boredom, maybe this is what happens.
01:03:09 It could be what happens.
01:03:10 It was interesting.
01:03:11 It's definitely interesting.
01:03:12 And my I believe that we're not actually living in some kind of singularity yet.
01:03:17 But I do think it's fun to think about.
01:03:18 All right, Jonathan.
01:03:19 Agreed.
01:03:20 Yeah, I think we should leave everybody with this philosophical thought for the rest of the holiday till they come back to work and focus on actual things.
01:03:28 Yeah.
01:03:30 In the meantime, ponder your existence when you're thinking about your New Year's resolutions.
01:03:33 Yeah.
01:03:34 You got to come back to work next year or do you?
01:03:36 What is work?
01:03:39 What is meaning?
01:03:40 Exactly.
01:03:41 All right, man.
01:03:42 Well, those were 10 really interesting stories.
01:03:45 I think it's been a great year for data science and AI and things like that.
01:03:48 Yeah, me too.
01:03:49 It's been it's been a fascinating year.
01:03:50 And I look forward to to 2017 being just as interesting and exciting.
01:03:55 Thanks.
01:03:55 Thanks so much for for having me on and for doing this.
01:03:57 I think this has been a really fun episode.
01:03:58 It's been great fun.
01:03:59 You're welcome.
01:04:00 So looking forward to 2017, everybody should be going to PyCon, right?
01:04:04 Oh, absolutely.
01:04:05 Absolutely.
01:04:07 Because I heard a rumor that there may be some very exciting Python focused podcasts that are all hanging out waiting to talk to you.
01:04:15 Absolutely.
01:04:15 Is that right?
01:04:15 That is absolutely right.
01:04:16 So partially derivative, talk Python, Python Bytes, podcasting it.
01:04:21 We're all getting together and we're doing a big group booth.
01:04:25 You can come talk to all of us, meet all of us.
01:04:27 We're going to be doing maybe some live recordings.
01:04:29 We don't quite know what that looks like yet, but we're definitely putting together a group booth somewhere in the expo hall.
01:04:34 So I think it's I'm not sure by the time this airs early bird discounts may be over, but don't wait till the end to buy your ticket.
01:04:41 Buy them right away because they sold out last year and they were sad people.
01:04:45 They reached out to me and wanted to come and I couldn't help them.
01:04:47 Yeah.
01:04:48 And if the trends are any indication, Python is only going to be more popular, only going to be more widely adopted.
01:04:54 PyCon will only get bigger and more fully attended.
01:04:57 So I agree.
01:04:57 Get your tickets now and come hang out with your favorite podcasters.
01:05:01 Yes.
01:05:02 It'll be the best.
01:05:02 It'll be great.
01:05:03 It's going to be great fun.
01:05:03 I'm looking forward to seeing you there.
01:05:05 Yeah, me too.
01:05:05 All right.
01:05:06 Catch you later.
01:05:06 All right.
01:05:07 Thanks.
01:05:07 Bye.
01:05:07 This has been another episode of Talk Python to Me.
01:05:12 Today's guest has been Jonathan Morgan, and this episode has been sponsored by Rollbar and Continuum Analytics.
01:05:18 Thank you both for supporting the show.
01:05:20 Rollbar takes the pain out of errors.
01:05:24 They give you the context and insight you need to quickly locate and fix errors that might have gone unnoticed until your users complain, of course.
01:05:32 As Talk Python to Me listeners, track a ridiculous number of errors for free at rollbar.com slash Talk Python to Me.
01:05:38 Whether you want to hear the keynote by Rowan Curran from Forrester Research, meet with the guys behind Anaconda, or just mingle with high-end data scientists, you need to find your way to Austin, Texas for AnacondaCon this February.
01:05:50 Start at talkpython.fm/Acon, A-C-O-N.
01:05:55 Are you or a colleague trying to learn Python?
01:05:58 Have you tried books and videos that just left you bored by covering topics point by point?
01:06:02 Well, check out my online course, Python Jumpstart, by building 10 apps at talkpython.fm/course to experience a more engaging way to learn Python.
01:06:11 And if you're looking for something a little more advanced, try my WritePythonic code course at talkpython.fm/Pythonic.
01:06:19 You can find the links from this episode at talkpython.fm/91.
01:06:24 That's right.
01:06:25 Anytime you want to find a show in the show page and show notes, it's just talkpython.fm/episode number.
01:06:31 Be sure to subscribe to the show.
01:06:33 Open your favorite podcatcher and search for Python.
01:06:36 We should be right at the top.
01:06:37 You can also find the iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss on talkpython.fm.
01:06:46 Our theme music is Developers, Developers, Developers by Corey Smith, who goes by Smix.
01:06:51 Corey just recently started selling his tracks on iTunes, so I recommend you check it out at talkpython.fm/music.
01:06:58 You can browse his tracks he has for sale on iTunes and listen to the full-length version of the theme song.
01:07:03 This is your host, Michael Kennedy.
01:07:05 Thanks so much for listening.
01:07:07 I really appreciate it.
01:07:08 Smix, let's get out of here.
01:07:10 Smix, let's get out of here.
01:07:10 Smix, let's get out of here.
01:07:31 Bye.
01:07:32 .
01:07:32 We'll be right back.
01:07:32 Thank you.