Top 10 Data Science Stories from 2015

Episode #40, published Tue, Dec 29, 2015, recorded Sun, Dec 13, 2015

Episode Deep Dive Transcript

It's the end of the year and many of you are probably kicking and taking it easy without a TPS report to be seen. So we'll keep this fun and lighthearted this week. We've teamed up with the Partially Derivative podcast and we're running down the top 10 data science stories of 2015 in this joint episode.

Links from the show:

Jonathon Morgan:
goodattheinternet.com
@jonathonmorgan
Partially Derivative Podcast: partiallyderivative.com
Popily Private Beta: popily.com

#1 You’ll Never Keep Your New Year’s Resolutions:
fivethirtyeight.com/datalab/how-fast-youll-abandon-your-new-years-resolutions

#2 Serial: Superfans Solve with Stats:
fivethirtyeight.com/features/the-superfans-using-stats-to-get-to-the-bottom-of-serial

#3 $6M funding for Jupyter / IPython:
blog.jupyter.org/2015/07/07/jupyter-funding-2015

#4 All of a Sudden People Freak Out About AI:
cnet.com/news/artificial-intelligence-experts-sign-open-letter-to-protect-mankind-from-machines
Mario with sentience: mashable.com/2015/01/19/super-mario-artificial-intelligence
Mario video: youtu.be/AplG6KnOr2Q

#5 Our Gates Were Deflated: slate.com

#6 The US gets its first Chief Data Scientist:
gigaom.com/2015/02/05/dj-patil-has-joined-the-white-house-to-wrangle-data-issues
Open-Source Society (PyOhio 2015 keynote):
pyvideo.org/video/3671/keynote-by-catherine-devlin

#7 Winter is Coming. Probably - Bayesian analysis and GoT:
allendowney.blogspot.com/2015/03/bayesian-survival-analysis-for-game-of.html
The War Of The Five Kings, A Dataset:
github.com/chrisalbon/war_of_the_five_kings_dataset

#8 Microsoft Offends Everyone by Guessing How Old We Are: how-old.net

#9 The Biggest Political Science Study of the Year...Was a Fraud:
vox.com/2015/5/20/8630535/same-sex-marriage-study
The Big Sort: Why the Clustering of Like-Minded America Is Tearing Us Apart:
amzn.to/1miT8gA

#10 Python jumps to all time high in popularity:
tiobe.com/index.php/content/paperinfo/tpci/

Episode Deep Dive

Guest Introduction and Background

Jonathan Morgan is a co-host of the Partially Derivative podcast and the CEO of a data science startup called Poppily. He has extensive experience in data science and is passionate about making data both approachable and actionable. In this joint episode with Talk Python To Me, Jonathan teams up with Michael Kennedy to look back at the top data science stories of 2015 and to highlight the powerful intersection of Python and data science.

What to Know If You’re New to Python

Below are a few essentials to help you get more out of this episode’s data science and Python discussion:

Understand basic package installation: Whether you use pip or conda, know how to install popular data libraries like NumPy, pandas, and requests.
Familiarize yourself with Jupyter Notebooks: This episode references how notebooks (formerly IPython Notebooks) are pivotal in data science for exploratory coding and visualizations.
Basic syntax and data structures: Lists, dictionaries, loops, and functions are all used extensively in Python’s data workflows.
High-level sense of data analysis: Concepts such as outliers, correlation, or how to interpret data stories (e.g., from 538.com) will help you follow along.

Key Points and Takeaways

Python’s Rapid Rise and Popularity in 2015 Python saw a significant jump in the TIOBE Index, moving from eighth place to fourth in a single year. This reflected a broader trend: Python was not only dominating data science but also proving itself for web development, automation, scientific research, and more. The conversation emphasized how Python’s easy syntax, strong libraries, and broad ecosystem propelled it forward. Many first-year computer science courses across universities began adopting Python as the primary teaching language.
- Links / Tools:
  - Talk Python Training
  - Tiobe Index
Poppily: Bridging Data Science and Accessible Dashboards Jonathan Morgan introduced Poppily, a venture aiming to simplify data analysis through user-friendly dashboards and modern data workflows in Python. Their focus lies in making “everyday analytics” accessible to developers and non-developers, ensuring that crucial insights can be derived from raw data. By leveraging Django, SciPy, and other Pythonic tools, Poppily illustrates how Python’s ecosystem seamlessly merges web development with data science.
- Links / Tools:
  - poppily.com
Major Funding for Jupyter (formerly IPython) Notebooks The Jupyter project received $6 million in open-source funding, reinforcing the importance of interactive notebooks for data science and academic research. Jupyter’s reactive execution model allows users to visualize results immediately, making it easier to share reproducible research. This was seen as a critical development for scientific communities and Python users, solidifying Jupyter’s role as a mainstay in Python-based data workflows.
- Links / Tools:
  - Project Jupyter
AI and Societal Concerns The episode touched on 2015’s rising debate around artificial intelligence and its potential risks, highlighted by prominent voices such as Elon Musk, Bill Gates, and Stephen Hawking. While AI was rapidly advancing, seen in everything from self-driving cars to complex recommendation engines, there were concerns about unchecked AI leading to unintended consequences. On the lighter side, a German research team even gave “agency” to Mario in Super Mario Bros., letting him learn and respond to new scenarios with basic AI-driven logic.
- Links / Tools:
  - Mashable article on “Mario Lives!” AI
  - OpenAI (today a major AI player; not directly named in 2015, but relevant context)
New Year’s Resolution Data from FiveThirtyEight FiveThirtyEight.com published data on how quickly most people abandon their New Year’s resolutions, usually within the first two weeks. Popular resolutions like weight loss and exercising more were shockingly short-lived. This segment showcased how even seemingly fun or casual data can reveal human behavior trends, bridging the gap between everyday life and more formal data science.
- Links / Tools:
  - FiveThirtyEight.com
Data Analysis of “Serial” Podcast The “Serial” podcast captured massive public interest in 2015. Data scientists at FiveThirtyEight applied Bayesian reasoning to reexamine evidence from the murder trial at the heart of the show. By quantifying small probabilities across multiple events, they revealed that what seemed overwhelmingly unlikely might still be possible, or vice versa, sparking broader appreciation for Bayesian analysis in reallife scenarios.
- Links / Tools:
  - Serial Podcast
  - Bayesian Analysis on 538
Deflategate and The New England Patriots A statistical dive into NFL turnover (fumble) rates showed that the Patriots were major outliers, fumbling far less often than the rest of the league. The data pointed to the infamous “Deflategate,” reinforcing how objective analysis can detect unusual patterns amid sports controversies. This was a classic example of outlier detection and real-world sports analytics.
- Links / Tools:
  - FiveThirtyEight’s Deflategate Analysis
U.S. Government’s Chief Data Scientist In 2015, the United States appointed DJ Patil as its first-ever Chief Data Scientist, marking a milestone for data-driven policy. Patil’s role included promoting open data initiatives and transparency, fostering public engagement and innovation. This signaled how data science was no longer confined to academia or tech giants, governments worldwide were also recognizing its transformative power.
- Links / Tools:
  - White House Archives on DJ Patil’s role
Game of Thrones Data Analysis Fans went beyond mere fandom, applying data science to George R.R. Martin’s “A Song of Ice and Fire.” Detailed data sets on battles, character deaths, and allegiances helped predict who might die next. Beyond the fun, it illustrated how data science can apply even to fictional worlds, providing valuable practice for real-world analysis.
- Links / Tools:
  - Game of Thrones Battles on GitHub (search “Game of Thrones battles” or “Chris Albon GOT”)
  - Wiki of Ice and Fire
Microsoft’s How-Old.net and Face-Aging AI Microsoft’s playful web app let users upload photos to estimate age. While it was often inaccurate (leading to some users feeling insulted), the site gave the public a hands-on look at machine-learning-based facial recognition. It also showcased the broader theme of AI tools making their way into entertaining consumer applications.

Links / Tools:
- how-old.net

Fraudulent Same-Sex Marriage Opinion Study A widely publicized study claimed that a brief personal conversation with a gay canvasser could shift views in favor of same-sex marriage. Later investigations revealed the data was fabricated, with suspiciously consistent “randomness.” The uncovering of this fraud underscored how thorough data validation and reproducible research practices are crucial in both academia and the media.

Links / Tools:
- Retraction Watch coverage
- The Big Sort Book Reference (mentioned as further reading on polarization)

Interesting Quotes and Stories

"All of the best data science is done in a two-beer buzz. It's the secret nobody tells you." -- Jonathan Morgan

"You only learn it in grad school." -- Michael Kennedy (jokingly, in response to the ‘two-beer buzz’ comment)

"If I jump on Goomba, he will certainly die." -- The 'AI Mario' experiment (referencing the comedic and insightful demonstration of AI)

Key Definitions and Terms

Bayesian Analysis: A statistical method that updates the probability of a hypothesis as more evidence or information becomes available.
Multiple Testing Problem: The error that arises when considering multiple hypotheses or data comparisons at once, which can inflate false positives if not accounted for.
Jupyter / IPython Notebooks: A development environment providing live, interactive coding within a web interface. Users can mix code, Markdown, and visualizations for a reproducible research workflow.
Outlier Detection: Identifying data points that deviate significantly from the general trend or distribution, often signaling anomalies or potential errors.

Learning Resources

Here are a few courses and materials if you want to dive deeper into Python for data science and beyond:

Python for Absolute Beginners: A perfect starting point if you are new to coding and Python.
Data Science Jumpstart with 10 Projects: Get hands-on with data-focused mini-projects to build your analysis skills.
Python Data Visualization: Learn to create effective, interactive plots and dashboards.

Overall Takeaway

Data science had an especially eventful year in 2015, with everything from open-source funding breakthroughs to fun experiments, like Microsoft’s face-aging app, reaching mainstream audiences. This episode reminds us that Python stands at the center of these developments, powering innovative projects and forging connections between academia, industry, and everyday enthusiasts. Whether you’re examining sports scandals, analyzing pop culture phenomena, or digging into social science data, Python’s vibrant ecosystem continues to evolve alongside the ever-expanding world of data.

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 It's the end of the year, and many of you are probably kicking back and taking it easy without a TPS report to be seen.

00:05 So we'll keep this fun and lighthearted this week.

00:08 We're running down the top 10 data science stories of 2015 on episode 40 of Talk Python To Me with guest Jonathan Morgan, recorded December 13th, 2015.

00:18 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the library, the language, the library, the language.

00:48 The ecosystem and the personalities.

00:50 This is your host, Michael Kennedy.

00:52 Follow me on Twitter where I'm @mkennedy.

00:54 Keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via at Talk Python.

01:00 This episode is brought to you by Hired and DigitalOcean.

01:04 Thank them for supporting the show via Twitter where they're at Hired underscore HQ and at DigitalOcean.

01:10 Hey, everyone.

01:12 This episode's a little unique.

01:14 I've partnered with a great data science podcast called Partially Derivative, and we're doing a joint show multicast of both podcasts.

01:21 If you like this sort of thing, be sure to check out partiallyderivative.com and subscribe to their show.

01:26 Also, I wanted to let you know I'm not releasing a show next week.

01:29 I'm on vacation.

01:31 So I'm going to take next week off, do a little resting and relaxation, hang out with the family, and be ready to get back and do a ton of awesome shows for you in 2016.

01:40 Now, let's get on to this co-hosted episode I did with Jonathan Morgan.

01:44 Hey, Jonathan.

01:46 Welcome to the show.

01:47 Hey, Mike.

01:48 Thanks so much for having me.

01:50 Yeah, I'm really excited to do this joint Talk Python partially derivative podcast about the end of the year and data science and all that awesome stuff.

01:58 Yeah, I'm super excited, too.

02:00 I think when our powers combine, it is the ultimate resource for Python data, Python stuff in the universe.

02:07 Absolutely.

02:08 Yeah, it's going to be great.

02:09 It's going to be great.

02:10 So for those of you who don't know me, my name is Michael Kennedy.

02:13 I'm the host of Talk Python2Me, a sort of developer-focused Python podcast.

02:17 And this week, I'm teaming up with the partially derivative guys.

02:21 And for those of you who don't know me, I'm Jonathan Morgan, one of the co-hosts of the partially derivative podcast, a podcast about data science, kind of, but also sort of about screwing around and drinking beer.

02:34 Yeah, that's really the most common combination of any two things, I would say.

02:38 It's probably drinking beer and data science, right?

02:41 All of the best data science is done in a two-beer buzz.

02:45 It's the secret nobody tells you.

02:46 That's right.

02:47 You only learn it in grad school.

02:49 Yeah, exactly.

02:51 I kind of like to hear about your company.

02:54 The last I heard is you guys were starting a data science company, and that's about all I heard.

02:59 It's called Popply, right?

03:00 Yeah, that's right.

03:02 So it's called Popply, and my two co-hosts of partially derivative, Chris Albin and Vidya Spandana, started this data science company.

03:10 And so basically, we were realizing that there's a whole bunch of data science that's super hard.

03:17 And everybody, I think, is familiar with this kind of artificial intelligence and complicated machine learning.

03:25 But then there's a lot of data science that's actually pretty straightforward.

03:28 It kind of boils down to inference and making charts out of data that you just have sitting around.

03:35 So you have a better everyday idea about what's happening.

03:38 And it turns out that that's super hard.

03:39 Like, actually, even for pretty technical people, it's super hard.

03:42 You know, like, I meet a lot of developers who are like, they'll ask me a question like, so I've got like, like, thousands of rows of data in my database.

03:51 And they're all like, at a time.

03:53 But then how do I like, look at it, you know, like, over time?

03:59 And it's like, oh, that's right.

04:00 Like, I mean, super technically competent people who are still like, I just don't really understand how the data thing works.

04:06 And that first kind of turning that first corner was really important to us.

04:10 We're like, okay, cool.

04:11 We could actually empower a lot of people to do data stuff if we could make that first step automatic.

04:17 Like, if we could just go from some raw data to charts to give you an idea about what's happening inside your data, we should definitely do that for people.

04:24 And so that seems easy.

04:25 It took us a little bit longer than we thought.

04:29 All those details, they keep sneaking in there.

04:31 Exactly.

04:33 Exactly.

04:33 But yeah, so, but we've released a product.

04:37 It's in private beta.

04:38 So talk Python to me, listeners.

04:41 You should definitely be part of the private beta.

04:43 I'm sure there'll be some contact information.

04:46 It's poppily.com.

04:47 You can go request an invite or just email me or at me on Twitter or something, and we'll get you in the private beta.

04:52 Yeah, and we're releasing publicly early next year.

04:55 So it's super fun.

04:56 We're having a really good time.

04:57 That's great.

04:58 You guys are actually using Python quite a bit there, right?

05:00 Oh, yeah.

05:02 Up and down the stack.

05:03 It's all Python.

05:04 So some of the data people out there might know that there's another couple languages that do some data science or that people use to do data science.

05:12 One is called R.

05:13 People use a language called Scala.

05:15 None of them live up to the awesome power and flexibility of Python, which is why we use it in almost everything we do.

05:23 From the web app that people interact with.

05:25 The web app that people interact with when they're actually using the system.

05:27 All of that's in Python.

05:28 It's actually a Django app.

05:30 I hadn't coded in Django for a while.

05:31 It was super fun.

05:32 And then the back end uses SciPy and the whole SciPy stack for some of the machine learning and data processing stuff that we're doing on the back end.

05:42 So we are a Python shop all the way.

05:45 That sounds really fun to be putting all that together.

05:47 I'm sure you guys are liking it.

05:48 Yeah.

05:50 It's actually the coolest thing about Python from my perspective is that you can do kind of complicated scientific computing and stats and then plug it right into the web app that you'd already built because the language is so flexible.

06:00 So it's been fun.

06:04 Yeah.

06:04 Very cool.

06:05 So I'll put a link in the show notes, but definitely if you guys, if that sounds interesting, check out poppily.com.

06:11 Is that right?

06:11 Yeah.

06:12 Yeah.

06:12 That's it.

06:12 Poppily.com.

06:13 We thought about poppily.io, but poppily.io, it was just weird.

06:17 It was too much.

06:18 Get one of those Libyan domains.

06:20 Those are always good for the startups.

06:22 The LOIs.

06:23 Yeah.

06:24 Yeah.

06:25 Poppily.

06:26 Poppily.

06:27 Would have been the best.

06:27 Yeah.

06:28 Missed opportunity.

06:29 Missed opportunity.

06:29 Well, you can always change the name if you really have to.

06:33 All right.

06:34 So you want to talk about this year?

06:36 I mean, this show's going to come out on Talk Python on the 29th, and I suspect around the same time on Partially Derivative.

06:43 It's perfect, like right at the end of the year, to talk about sort of what has happened in Python world intersected with data science, I guess.

06:52 Yeah, absolutely.

06:54 I mean, it's been a big year.

06:55 A big year for Python and a big year for data science.

06:58 The first pick, maybe not the most important, is probably most relevant to people while they're listening to this show.

07:05 Like, if it comes out on the 29th, you know, you got some vacation.

07:07 Maybe you'll pick it up around the 31st or the 1st.

07:11 That's typically when we make our New Year's resolutions, right?

07:15 It is.

07:16 And the first story is all about how you're pretty much going to fail at this.

07:20 Don't get your hopes up, right?

07:24 I know.

07:25 The numbers are.

07:26 The probability is that you're not going to stick with that New Year's resolution, which is funny because I think it's something that we all know.

07:32 But this is actually from last year.

07:33 Mona Chalabi, who's not at 538 anymore.

07:36 She's doing data journalism for The Guardian.

07:39 But she was at 538 at the time.

07:43 And this was actually kind of like a larger story of the last year that I think kind of data journalism also really came into the, you know, kind of came into the public mindset.

07:53 And she did a really interesting piece where she broke down the stats of how often people fail at their New Year's resolutions and like how long they keep them.

08:01 And I guess it's something like 70 odd percent fail within the first two weeks.

08:07 I'm going to change my life.

08:10 I promise this year will be different.

08:12 Oh, maybe.

08:13 But it's a Tuesday.

08:15 And, you know, my friends are going out or whatever, right?

08:18 Yeah, yeah, totally.

08:20 There's a lot of, I like, I really like the aspiration of New Year's resolutions.

08:23 And I'm just not going to think about the cold, hard reality of their eventual failure until later this month.

08:29 Yeah, exactly.

08:30 You're going to wait at least two weeks.

08:31 Exactly.

08:32 Yeah.

08:32 So some of the most common ones, well, the most common by quite a measure was lose weight.

08:38 And closely related to that was exercise more.

08:40 And then the third one really, in terms of popularity, really puts a sort of a challenge on being able to determine whether or not you've achieved it, which is to be a better person.

08:52 How do you analytically answer that, right?

08:54 That's true.

08:56 It's tough.

08:57 It's tough.

08:57 I mean, I guess you probably could say there's a little bit of wiggle room.

09:01 If 70% of people failed at that resolution, that's actually really worrying.

09:06 I need to be just an incrementally better person.

09:10 They're like, no, I tried, but I'm still a jerk.

09:12 No, I yelled at the neighbor again and kicked over.

09:15 Yeah, exactly.

09:18 Nice.

09:18 Oh, well.

09:22 Oh, well.

09:23 So you and I are both big fans of podcasts.

09:26 We listen to a bunch and obviously we produce some that we're very passionate about.

09:30 And the next one, the next item is actually about the most popular podcast of all time, something called Serial.

09:39 Yeah, in fact, I think it was the only thing in 2015 that was more popular than Talk Python To Me and data science.

09:47 The only thing.

09:48 Everything else, you know, sort of pales in comparison to these two giants of media dominance.

09:53 But of course, there is Serial.

09:55 Yeah, Serial is, if you guys haven't heard of it, it's a podcast that goes back and looks at a person accused, maybe convicted of murder.

10:06 I can't remember.

10:07 No, he was convicted.

10:09 Convicted.

10:09 Yeah, that's what I thought.

10:10 And goes through this high school guy and sort of rehashes, reevaluates the evidence, redoes the interviews.

10:18 And it's like an investigative journalism look, but through the eyes of a podcast rather than maybe through like Time magazine or whatever.

10:26 And it was downloaded something like five million times a week.

10:30 It completely broke all the records of all time.

10:33 Yeah, it was pretty amazing.

10:35 I mean, and by the way, spoiler alert, the dude is totally...

10:39 Totally guilty.

10:39 I'm just...

10:40 I mean, this is not really a spoiler because that's not how the show ends.

10:43 But this is going to be cool.

10:45 This is going to be really divisive for your listeners because half, I think, are going to write angry emails and the other half are going to be like, totally.

10:52 So that's where I stand.

10:53 I feel like I just...

10:54 I feel the responsibility to get that off my chest.

10:57 No, I can't.

10:59 I hash it.

10:59 Well, this second item, also at FiveThirtyEight.com, sort of a journey of some data sciences folks to actually go through and try to apply data science to this journalism to answer statistically or, you know, sort of through data science.

11:14 Is he guilty or not?

11:16 Yeah.

11:17 And this was actually interesting because it's not really something that can be quantified.

11:21 In fact, that was really a big theme in the show was that all of the information that we had at the time and that we have now to assess whether or not this man was guilty of the crime that they, you know, they think that he committed this murder is really suspect.

11:37 It's really not – it's hard to say conclusively what did and didn't happen because it relied so much on, you know, personal accounts of the events of the day.

11:45 That said, there was some information that they could point to that definitely happened.

11:50 And the big argument that everybody who believes that he's guilty was making was that none of these – none of the events in particular, like, made it certain that he was guilty.

12:03 But when looked at in the aggregate, then that was really unlikely.

12:07 That was sort of the intuition that everybody had.

12:09 But the interesting part was that the – so, 538 interviewed a couple people who are data scientists and they went through kind of a Bayesian process for, you know, assessing the likelihood that each of the events could happen in concert.

12:23 Like, whether or not he basically just had, like, super bad luck.

12:26 And it actually, you know, when you look at this as something called a multiple testing problem, that's a way that you can test the hypothesis in lots of different ways.

12:34 And looking at it through that lens makes it seem a little bit more probable than you might first assume that all of these different things could happen to him.

12:43 So, basically, like, he asked the victim of the crime for a ride.

12:46 He lent his car and his cell phone to somebody else who was also accused of the murder.

12:52 And then his phone was in a location that was in, like, you know, within, like, a small distance from where the body was found.

12:59 And then his cell phone records seemed to corroborate with a bunch of other, like, of the prosecution's testimony about how he totally – so, basically, like, there was, like, four or five things that made it, like, dude, that's impossible.

13:09 Like, if all those things are true, you were definitely guilty, even though none of them as an individual piece of information is super damning.

13:15 But it turns out, according to these data scientists, that, well, you know, maybe we should give this a second look.

13:20 It's actually not that unlikely that he could have had that much bad luck.

13:23 So, we'll see.

13:24 Yeah, that's – I think it's really interesting to take, you know, a hard science like data science that's working with numbers and apply it to something soft like interviews and likelihood that someone's telling the truth and these kinds of things.

13:38 So, I think even in that regard alone, it's really interesting.

13:42 Yeah, totally.

13:42 I mean, ultimately, it was pretty subjective.

13:44 But it was – I mean, and it's hard because this was – these are, like, real people's lives that we're talking about.

13:49 It was a true story.

13:50 You know, it's not like a murder mystery.

13:52 But it was really hard not to get into it and take sides and think about it like, you know, like a whodunit.

13:56 So, you know, apologies to all involved that we're talking about this in such an insensitive way.

14:03 Yeah, absolutely.

14:03 It's definitely a harsh reality that something bad happened to somebody.

14:08 They're having a second season.

14:09 I haven't listened to it yet.

14:10 What's the story of the second season?

14:11 Do you know?

14:11 Just came out.

14:13 No.

14:13 Yeah.

14:14 No, I don't know.

14:15 I think they're probably, you know, trying to keep up with the times because, you know, if you don't – as we know, you got to keep putting out content or people get bored.

14:25 So, Serial 2 – but I guess all I know is that it's probably – I think it's not about the same guy.

14:31 So, if you're sick of hearing about this guy's story, then you're in luck because it's about – I'm assuming another unsolved murder.

14:37 Yeah, it's got to be.

14:38 It's got to be.

14:39 All right.

14:40 Moving on to the next item is something very near and dear to the Talk Python listeners' hearts, I'm sure.

14:47 And that's Jupyter and iPython and iPython Notebooks.

14:51 Yeah.

14:52 Yeah, absolutely.

14:53 And they've had a huge year.

14:56 Yeah.

14:56 So, I think – yeah.

14:58 And so, I don't – I'm assuming a lot of your audience will be pretty familiar with iPython and Jupyter.

15:02 Although, that said, I guess there's like two camps of Python developers, I feel like.

15:08 Like sort of web developers and software engineers and then kind of data and stats folks.

15:14 Did you come at Python from a computer science perspective or did you have to find some language to do your specialty and kind of grow into programming?

15:21 I suspect that that second category is very well familiar with the iPython stuff.

15:27 So, maybe I should just tell everyone.

15:29 iPython Notebooks are these sort of interactive documents.

15:34 You can load them up as web pages and you can write a little bit of Python code and then you can actually execute them real time right there.

15:42 Like so, you could pull some data from a database and then do some sort of science on it and a graph pops up.

15:48 And then you write a little bit more code and another bit of data pops up.

15:52 And these things are sort of live research documents.

15:55 Very powerful.

15:56 And this has been sort of generalized out of the Python world through this Jupyter project to apply to many different programming languages.

16:05 So, this Jupyter project is an open source project run by Fernando Perez and some other guys.

16:10 I'm actually working on having them on the show shortly.

16:13 So, a couple of people have asked if they can be on the show.

16:15 And yes, we'll have somebody from iPython and Jupyter soon.

16:19 But these guys are an open source project that has just received $6 million in funding.

16:25 Yeah, which is kind of amazing.

16:28 I mean, it's a massive amount of funding for a project like this that's effectively a developer tool.

16:33 But it's so useful.

16:35 I think anybody – I was a little bit skeptical of it at first because I came at Python first from more of a computer science and software engineering background.

16:43 And it was later that I got into data stuff.

16:46 But to have something that you can basically record all of your actions – because when you're doing data projects, so often you're like, you need to explain – you get to a number or you get to a chart or you get to a discovery or whatever.

16:57 And then you need to communicate why that matters to somebody else.

17:01 And it only really matters if you can give the context.

17:04 So, well, like first I took this slice of the data and then I manipulated it in this way.

17:09 And then I extracted these features.

17:11 And then once I had those features, I decided to clean some of them up by doing X, Y, and Z.

17:15 And then once I'd done all of those things, obviously, if you look at this chart, it's really meaningful.

17:19 But without those steps up to that, it's like, that's awesome.

17:22 All I can see is like three bars.

17:25 To be able to play that back for somebody and capture it is really awesome.

17:30 And it's nice too when you're not a great coder, which I'm not, and you have to step through and go, wait, why didn't that work?

17:37 Let me run it again.

17:37 Why didn't that work?

17:38 Let me run it again.

17:39 Yeah, yeah.

17:40 They're very cool.

17:42 If I think of writing sort of science-based academic papers, it seems crazy to not do them as something like this, where rather than just saying, oh, I did something on my own.

17:54 Here's a chart.

17:55 Believe it, right?

17:55 Here is the actual code.

17:57 And here is the data.

17:58 And you can just have the whole thing and run it if you like.

18:00 Right?

18:01 That's amazing.

18:02 Yeah, totally.

18:02 To be able to reproduce research by literally running the code that led to the discoveries that informed the paper is huge.

18:11 I mean, Chris, my co-founder and the co-host on Partially Derivative, went and got a PhD.

18:19 And he talked about that all the time.

18:22 Often in postgraduate work, you get assigned a task to reproduce the research in somebody else's paper.

18:30 And he was always stunned at how difficult that was.

18:34 Because even when you had access to the raw data, trying to work with it in such a way that produced the same results was just, you know, it's just really rough.

18:42 And so being able to capture that process where like every little decision that you made to manipulate the data in a particular way.

18:49 And when I say manipulate in this context, I mean like a legitimate transformation of the data, not like a shady manipulation of the data.

18:57 Not like witness tampering type manipulate, but like we're going to make some assumptions about the underlying physics or statistics and then get a better answer.

19:06 Yeah.

19:07 Yeah, totally.

19:09 Well, and because so much of this kind of work is like it's a little bit about it.

19:12 There's some it takes some creativity.

19:14 It takes a little bit of intuition, especially if you're working with like natural language and you're trying to extract some you're trying to like make sense of unstructured data.

19:22 You know, you make a lot of little choices on the way and those do impact the results that you see.

19:27 And that's why being able to reproduce it is so important and for everybody to understand the assumptions that you made.

19:32 And so I think I mean, I can only assume that was part of why they received this like massive bucket of funding.

19:37 It's a super popular product.

19:39 So, yeah, it's going to definitely be a really important foundation of science, period.

19:44 But if you just think of any open source project, like what other open source project do you know that's not got a company behind it that somebody gave six million dollars like this?

19:53 This is a big deal.

20:04 This episode is brought to you by Hired.

20:07 Hired is a two sided, curated marketplace that connects the world's knowledge workers to the best opportunities.

20:14 Each offer you receive has salary and equity presented right up front and you can view the offers to accept or reject them before you even talk to the company.

20:23 Typically, candidates receive five or more offers in just the first week and there are no obligations ever.

20:29 Sounds pretty awesome, doesn't it?

20:31 Well, did I mention there's a signing bonus?

20:33 Everyone who accepts a job from Hired gets a $2,000 signing bonus.

20:37 And as Talk Python listeners, it gets way sweeter.

20:42 Use the link Hired.com slash Talk Python To Me and Hired will double the signing bonus to $4,000.

20:49 Opportunity's knocking.

20:51 Visit Hired.com slash Talk Python To Me and answer the call.

20:55 Okay, so number four on our list is artificial intelligence.

21:08 There have been some cool shows about artificial intelligence like X Machina and that actually happened to feature a little Python code in the show as well, which is cool.

21:18 But people are freaking out about it.

21:19 Yeah, this is this is kind of a new thing this year.

21:23 I don't know.

21:24 Maybe it's because machine learning and AI are becoming more a part of like mainstream conversations.

21:29 Or like you said, there was an entire movie made about it where they referenced the Turing test specifically, like in a major Hollywood movie that blew my mind.

21:39 Because that's pretty nerdy stuff in normal circumstances, right?

21:43 It's super cool.

21:44 But yeah, and now all of a sudden you have these super prominent leaders of the technology community coming out like for and against, like speculatively against the idea of AI because, you know, the matrix or whatever.

21:58 So I don't know.

21:59 It was really interesting to see that debate happening in public.

22:02 I don't know if you I don't know if you track that very much.

22:04 I mean, like Bill Gates was warning us and Stephen Hawking warning us about artificial intelligence.

22:09 And then, of course, other people on the other side giving really nuanced defenses of the way that artificial intelligence helps humans by making better decisions.

22:19 Or it was it's super fascinating conversation.

22:21 Yeah, it's very fascinating.

22:23 The person that came out that sort of in my mind carried the most weight, honestly, was Elon Musk.

22:30 Yeah.

22:31 Well, I mean, everything that Elon Musk is always when Elon Musk speaks, we listen.

22:36 That's right.

22:37 Because, I mean, Bill Gates, I'm actually a fan of his.

22:40 I think he did some really cool stuff.

22:41 But I kind of feel like he's, you know, has a certain worldview that's sort of already here.

22:49 And Stephen Hawking has some amazing views of the universe.

22:54 But at the same time, I'm not entirely sure how practical his actual interaction with AI and programming is.

23:00 But Elon Musk says, I'm going to build an electric car that's like amazing.

23:05 He built it.

23:05 I'm going to build things that go to space.

23:07 And somehow he does that.

23:08 I mean, he could actually build AIs if he wanted, if anybody can, I would say.

23:14 I think that that's the only way that he's really going to be a Bond villain, because at the moment he's got he's got like super cool, you know, international technology companies that are doing amazing things.

23:29 Like you said, going to space.

23:31 But it's it's not quite supervillain status yet.

23:34 And I think if he builds a robot that like seamlessly works its way into society and obviously initially for good.

23:41 But, you know, it gets out of control.

23:43 That's where that's where the plot of the movie really starts to get thick.

23:47 I'm excited about this.

23:48 Yes, absolutely.

23:49 The AIs had to be created to man the supercharging stations up and down the West Coast.

23:53 And then it just went all wrong.

23:55 They got in the cars and spread.

23:56 Exactly.

23:59 They decided that they deserved more than the menial tasks that they've been assigned.

24:02 Yeah.

24:03 So we'll link to a really cool article from this project that was sort of getting respected leaders in the AI space to sign sort of a pledge to proceed with caution.

24:17 But one of the follow up articles that we'll also list that I really liked was this thing about Mario as in like Super Mario Brothers.

24:25 That was really cool.

24:26 Yeah.

24:27 Yeah, it's pretty funny.

24:28 There were some researchers that like basically made Mario sentient.

24:33 I mean, maybe not quite.

24:35 That's a stretch.

24:36 But they empowered Mario, the character, with his own intelligence and then like would let him loose in the original game to see how well he could defeat the Goombas and all of the other dangers of Mario characters.

24:49 It was super fun.

24:49 It's like on the flip side.

24:50 There's these like really super accessible, fun artificial intelligence projects that aren't necessarily a threat to, you know, humanity.

24:59 Yeah.

24:59 Maybe it's a threat to our high scores on Mario Brothers, but not maybe humanity in large.

25:05 So, you know, those guys are, they're German.

25:07 They're in Tubingen at the university there, which is actually like 35 minutes away from me right now.

25:11 That's cool.

25:13 They made a video.

25:14 Yeah.

25:14 They made a video and we'll link to it.

25:16 It's on Mashable.com.

25:17 And they do all sorts of cool stuff.

25:20 There's a bunch of different intersections.

25:22 It's not just like one part of AI, but there's a lot of sort of understanding the world, understanding language, learning.

25:29 And so they would do things like they would ask Mario, like they can speak to him in English and he would answer in English.

25:36 It sounded a whole lot like war games.

25:40 It's like, you know, that really sort of choppy text to speech, that war games computer.

25:44 So it's kind of funny to hear Super Mario Brothers speak that way.

25:49 But they would say things like, jump, if you jump on Goomba, he will die.

25:54 And then they say, now jump on Goomba.

25:56 And of course the character dies.

25:58 And they say, what do you know about Goomba?

25:59 He says, if I jump on him, he will certainly die.

26:01 And then later they reset his mind and they tell him to go over and jump on Goomba.

26:08 They don't tell anything.

26:09 And then they later, first they ask him, what do you know about him?

26:12 He says, I don't know anything about him.

26:13 Jump on him.

26:14 The guy dies.

26:15 He says, now what do you know about him?

26:17 He goes, I know that he may die if I jump on him.

26:20 And it was really interesting.

26:21 He has all like different emotions and knowledge.

26:24 It's cool.

26:24 Check it out.

26:25 Yeah, it is super cool.

26:26 Actually, that's probably the best part about the project is like Mario's sort of existential,

26:31 like his existential ennui.

26:34 Yeah, absolutely.

26:36 So another thing that happened this year had to do with the New England Patriots.

26:43 And for those of you who maybe don't follow American football super close, this was sort of a big deal.

26:52 The New England Patriots, I don't really care one way or the other, but they'd kind of been seen as a team that, let's put it nicely, takes as much advantage of the situation as they can by, you know, maybe doing things they shouldn't.

27:06 That was so diplomatic.

27:09 And it had come to a head where around the Super Bowl time, they had actually been accused of deflating the footballs for their team.

27:18 And for a while, I didn't know what that meant.

27:20 Like, okay, well, maybe it hurts a little less to catch it.

27:23 I don't know.

27:23 But one of the things that deflated footballs let you do is hold on to them much tighter.

27:28 They're not slippery.

27:29 And so you won't fumble the ball and make some of these game losing mistakes or, you know, mistakes.

27:34 If you can, like, change the physics so it's not a problem.

27:38 Well, that's easier to solve than being better at football.

27:40 So the story is actually some data science folks came and looked at that.

27:46 And I don't know.

27:47 What's your opinion after looking at all the charts and graphs they built?

27:50 Well, I think it boils down to, in a simple way, that the New England Patriots were, like, a massive outlier.

27:58 And so you're right on that if you deflate the football a little bit, apparently, it makes it easier to hold on to.

28:06 And so for those of you who don't watch American football, it's a lot like rugby, if you're familiar with rugby.

28:13 Or more or less, it's like you have a ball that you're holding on to and you're running through a large group of very strong men who are all trying to take that ball from you.

28:21 And so you often lose it because it's a large group of really strong big men trying to take it.

28:28 And pretty much anything that you have in that scenario is not going to be yours for very long.

28:33 But the – and the trend overall in the league, the NFL, the league in which the Patriots play, was that the teams were having – were more successful, like, holding on to the football.

28:46 They had more plays that they could run.

28:49 So the ratio between the number of attempts that they made and the number of times they lost the ball was going up.

28:54 So you could run more plays and fumble less in general.

28:57 That said, the Patriots improved their ratio sort of exponentially more than all of the other teams in the league.

29:07 And so when you see this one outlier way at the top right of a graph, it tends to go like, well, that's not right.

29:13 Something doesn't make sense.

29:16 Something's different about that one data point.

29:19 And so, of course, that sparked some – it sparked a lot of speculation.

29:22 Like, why is it that the Patriots fumble so infrequently?

29:25 And, you know, where there's smoke, there's fire.

29:28 And in this case, at least.

29:31 And they found that the Patriots were deflating footballs just a little bit all of the time.

29:37 And by doing that, they were able to maintain better control of it because it was easier to grip the ball because there was less air inside.

29:43 So it was less buoyant.

29:44 Which – buoyant is probably not the right word to use in this context.

29:49 But whatever.

29:49 Whatever.

29:50 They could hold on to the ball.

29:51 And so that was the whole thing.

29:52 But basically, it was like the way that that was determined was through, you know, just doing some relatively simple data science.

30:01 Relatively simple analysis.

30:02 I mean, it's not that simple in the aggregate that the data – gathering the data was complex, connecting the dots, understanding the consequences of the things that we were learning.

30:09 But by and large, it was – you know, if you actually look at the analysis that detected that outlier that I just described, it's pretty straightforward.

30:17 So anyway, it's kind of cool.

30:19 And it was a super fun story to follow that ultimately was followed at such depth because, oh my gosh, sports in America, that it got a little tedious.

30:27 But at the beginning, it was cool.

30:28 It was like a wonderful little scandal.

30:31 Yeah, it was an interesting scandal.

30:33 And I think it really shows the power of data science because these sports guys, they can go back and forth.

30:38 They're talk radio and da-da-da-da-da.

30:40 No.

30:41 Look at that graph.

30:43 There's something going on here.

30:45 That's it.

30:45 The question is what is going on.

30:47 It's more likely something sort of sneaky is going on rather than they're just dramatically better than even second place, right?

30:56 So very interesting use of data science there.

30:59 I like that one.

31:00 So speaking of – Yeah, it was super fun.

31:02 Yeah.

31:02 So speaking of U.S. things, the United States now has a person called the chief data scientist.

31:08 That's pretty awesome.

31:09 Yeah.

31:10 It's kind of amazing, right?

31:12 Like we kind of went from like data science like, whoa, okay, that's something like a bunch of geeks do.

31:19 And now there's like somebody who has a title like over the whole domain of the entire United States doing data science.

31:25 They're chief.

31:26 They're the chief data scientist.

31:29 Yeah.

31:29 I mean that's like almost like a politician has got data science.

31:32 This is crazy.

31:34 No, I think it's a really positive move.

31:36 I mean you think of places that have lots of data.

31:40 Sports.

31:41 We were just talking about that.

31:42 They've got a lot of data.

31:42 CERN.

31:44 At the large Hadron Collider, those guys generate a lot of data.

31:46 But the United States, we have so many different things that we track about people.

31:51 And the answers to those questions really matter, right?

31:54 We make policy based on those numbers.

31:56 So having somebody in charge of doing that right makes a lot of sense.

31:59 Yeah.

32:00 And actually, you know what's super cool about the stuff that DJ Patil, he's the guy who is the chief data scientist, the first one.

32:08 The stuff that he's focused on is really awesome.

32:12 Like a big initiative of his is that he wants to open up a lot of the data that the U.S. government collects.

32:18 So to keep the government agencies more transparent than they've been before and just to advocate for the open data movement in general and get the general public interacting with the data that these agencies produce in a way, almost like as a means of civic engagement.

32:36 It's really awesome.

32:37 And so this whole idea that like you're the sort of the general public's relationship with government can be more modern, can be more technology driven, can be more part of the 21st century.

32:49 It's really cool.

32:50 So the open data projects in particular have been really fascinating.

32:53 And then I know healthcare has been a big focus of DJ's office.

32:57 They've been really looking at, again, how to encourage the public to improve the way that the healthcare system works by investigating the data that's being made available.

33:05 And by making the data that we do produce easier to transport, easier to access, easier to sort of combine and investigate.

33:14 It's really I think it's it's making data accessible and kind of top of mind for an entire generation of kind of, you know, early career Americans.

33:25 It's it's it's really fascinating.

33:27 Yeah, that's cool.

33:28 And, you know, maybe the United States government themselves has access to things that you wouldn't otherwise share.

33:34 Right.

33:34 Like you mentioned health data.

33:36 I think one of the problems with analyzing health data in the large is people don't want to just give away every single thing about themselves for good reason.

33:44 And so it's hard to to talk broadly about that.

33:47 Right.

33:47 But maybe there's extra data in there somewhere that they can, you know, help make people healthy or something.

33:52 That's cool.

33:52 Yeah, totally.

33:54 And it's in general, I think part of this whole idea of like quantified social science, I think for the longest time, science, social science that's done about, you know, the behavior of populations has been kind of anecdotal.

34:05 A lot of, you know, a lot of people doing qualitative research where they're, you know, using their intuition and using their, you know, their knowledge to connect the dots and tell a good story.

34:19 And I'm sorry, that probably sounds super like diminutive of a lot of social science research.

34:24 And I only kind of mean it to be.

34:26 But there's a move towards saying, hey, you know, if you can't back up your statements with data, even if you have to, you know, be creative about how that data is collected or be creative about how that data is interpreted.

34:38 I mean, that's fine.

34:39 But if you're not backing it up with any data, then it doesn't really mean anything.

34:42 And I think that that's, it's very encouraging for people of my persuasion.

34:46 Yeah, I totally agree.

34:48 The other thing that's cool, you talked about the open data project.

34:52 I mean, when you live in a democracy, you would expect that the data, your government, you're supposed to be sort of the boss of the government.

35:00 You should be able to have access to those things.

35:03 And so that's a really positive trend.

35:04 And also on that note, I want to recommend a video by Catherine Devlin.

35:11 She did the keynote at Pi Ohio this year, and she works for some government agency, sort of a, let's see if I remember it right.

35:20 She works for a group of programmers within the U.S. government that are basically an open source wing of this programming group.

35:30 And so they'll go into other places and say, we will help you with this project, but only if we get open source the results.

35:35 And they're trying to spread open source within the U.S. government as well.

35:39 So let's tie together nicely.

35:41 Yeah, absolutely.

35:43 There's everybody kind of involved with the CTO's office.

35:47 Megan Smith is the current CTO of the U.S.

35:50 And almost all the initiatives they're doing are, it's like a totally different way of interacting with government than I think ever existed before.

36:00 So if you get a chance, definitely check out what the CTO's office is doing.

36:05 There's a lot of cool ways to get involved.

36:07 There's a lot of cool projects that they do.

36:09 And there's a lot of interesting open source projects like you just mentioned.

36:13 So, and a lot of cool data sets.

36:15 And which, and by the way, some of those data sets are really great for learning if you're just kind of, you know, cutting your teeth on data science.

36:21 There's a lot of really interesting things about your community and your state and the country in general that might be more fun than working with, I don't know, a data set about advertising clicks.

36:34 Or whatever.

36:36 This episode is brought to you by DigitalOcean.

36:53 DigitalOcean offers simple cloud infrastructure built for developers.

36:57 Over half a million developers deployed to DigitalOcean because it's easy to get started, flexible for scale, and just plain awesome.

37:05 In fact, DigitalOcean provides key infrastructure for delivering Talk Python episodes every day.

37:11 When you, or your podcast client, download an episode, it comes straight out of a custom Flask app built on DigitalOcean, and it's been bulletproof.

37:18 On release days, the measured bandwidth on my single $10 a month server jumps to over 900 megabit per second for sustained periods.

37:26 And there's no trouble.

37:28 That's because they provide great servers on great hardware at a great price.

37:32 Head on over to DigitalOcean.com today and use the promo code TALKPYTHON, all caps, no spaces, to get started with a $10 credit.

37:40 Let's move on to number seven.

37:50 It's going to be late December.

37:52 Winter's coming, probably.

37:54 Yeah, maybe.

37:55 It's hard to know.

37:56 What's the story of this one?

37:58 So there's a handful of things on here that really I was attracted to because I just love this idea that data science and just kind of data analysis in general is becoming so much more mainstream.

38:14 And so this post was about the Game of Thrones.

38:21 And you may know, this is actually a shame that my co-host Chris isn't on the show right now because Chris produced a data set from A Song of Ice and Fire, the books, not necessarily the HBO TV show, Game of Thrones.

38:34 Obviously, the two are related, although I hear the TV shows kind of going off the rails and moving away from the books.

38:39 Anyway, so there's a lot of dying in this TV show, Game of Thrones, and the books, A Song of Ice and Fire.

38:46 And my co-host made a data set of all of the battles where he tallied the number of deaths.

38:52 And obviously, there's some estimation in here because the books don't go into detail all of the time.

38:57 But so actually, hold on.

38:59 Let me take a step back.

39:00 So what I should probably tell everybody is if you're not familiar with Game of Thrones and the books, the series of books are called A Song of Ice and Fire.

39:06 If you're not familiar with them, it's basically kind of like medieval fantasy type stuff.

39:10 So there's, you know, dragons and there's houses and, you know, the houses fight with each other over land and there's different families that are at war with one another.

39:18 Kind of the whole thing, right?

39:20 And so, of course, because they fight all the time, there's a lot of dying.

39:24 And if you were of a more data statsy persuasion, you might want to know, like, quantitatively, which house is the best.

39:33 Like, who wins the most battles?

39:35 Who kills the most soldiers?

39:36 Who has the biggest army?

39:37 And these are the sorts of questions that Chris's data set answers.

39:40 So you should definitely go find that.

39:43 I think it's just Game of Thrones battles on GitHub.

39:45 If you Google for it, I bet you'll find it.

39:47 But there's other work, other very interesting work that's being done about the likelihood that you'll die if you're a character in the books or the TV shows.

39:56 And so I just loved it, right?

39:59 Because, I mean, it's a fictional world.

40:01 It's made up.

40:02 The likelihood that you'll die is whether or not the author decides that you should die.

40:06 Like, that's really what's going on here.

40:09 But based on the sort of constructs of the world that the author created, this data scientist went through and he used – there's a whole Wikipedia clone called a wiki of ice and fire.

40:21 Haha, snort.

40:21 That catalogs and documents the deaths of every major character, like every major death in the book.

40:28 And so he went through and basically calculated the likelihood that any given character will survive based on their characteristics.

40:36 So are they from a particular family?

40:38 Are they highborn?

40:39 Are they lowborn?

40:40 Are they a man?

40:41 Are they a woman?

40:41 How old are they, right?

40:42 All of the things that you might say are like a feature of a character.

40:45 And then once you understand the features of a character, you can calculate the likelihood that they'll die.

40:50 Anyway, it's fascinating.

40:52 I'm sure, you know, we'll link to the blog post so you can go check it out.

40:55 But it pans out kind of like you'd expect.

41:00 You have a much higher chance of dying if you're not of the noble class.

41:04 I think men die more often than women.

41:06 And so on and so on.

41:08 You should go check it out.

41:08 But really, the reason to bring it up is just to say doing data analysis about characters in a fictional universe, I think is we're at some kind of peak.

41:18 We're at like peak data science awesomeness when that's a possibility.

41:23 Yeah, that's really cool.

41:24 And you could also run this algorithm on your favorite character and know whether or not you should get attached to them, right?

41:30 Yeah.

41:31 I mean, I think to be fair with these stories, the answer is probably always no.

41:34 That's right.

41:38 Yeah.

41:39 I mean, readers of the books will know.

41:40 I'm not, I'm not, this is not a spoiler or anything.

41:42 I'm not giving anything away.

41:43 If you like a character and you hope for them, if you have any hope that the character will survive, that's it.

41:50 I think that's probably the best indicator that they're going to die.

41:52 It's the kiss of death.

41:53 Yeah.

41:54 So speaking of dying, we're all getting older.

41:58 And Microsoft has decided that machines can know really well how old we are.

42:04 So if you go to how-old.net, Microsoft is using machine learning to guess how old you are.

42:11 I know.

42:12 Isn't that, how old did it get?

42:15 Do you want to say?

42:15 You don't have to reveal if you don't want to.

42:17 I don't mind.

42:18 So I've always kind of looked a little younger than I actually am.

42:24 People are always like, what?

42:25 You have kids?

42:26 What?

42:26 So my whole life has been kind of like this.

42:29 Like it was a problem in high school because it's not cool to look younger than everybody else when there's only a four-year stretch.

42:35 But when I get older, it looks better.

42:38 So I uploaded a picture of myself, the one I have on my main website.

42:44 It's the one you probably see on Skype right now.

42:46 And it actually said I was 47.

42:47 I'm like, what?

42:48 I'm only 42.

42:49 This is crazy.

42:50 But I have my glasses on, right?

42:52 And so I'm like, I'll try one without my glasses.

42:54 Uploaded it.

42:54 It said I'm 42.

42:55 They hit it straight, right on exact.

42:59 Whoa.

43:00 Whoa.

43:00 What does that say about the algorithm, though?

43:03 That you got aged by five years just because of your glasses?

43:05 Yeah.

43:07 I think the glasses, well, they may trend upwards.

43:10 I would wreck, if you don't want to be old, take your glasses off.

43:14 That's all I got to say.

43:17 How accurate was it for you?

43:20 Yeah.

43:20 Was it in the ballpark?

43:22 I was similarly offended, actually.

43:25 I uploaded a photo of myself and it guessed that I was in my early 40s, which I'm not.

43:34 Which is fine.

43:35 Early 40s are great.

43:37 I'm happy for you.

43:38 But you shouldn't be in them when you're not.

43:41 I don't.

43:43 Well, when you're in your, like, I'm in my early 30s and I was like, wait a second.

43:47 Wait a second.

43:48 Okay.

43:48 So I thought the same thing.

43:49 I was like, maybe there's something about my appearance in the photo that is, you know, cause for concern.

43:55 Right?

43:56 I was like, okay.

43:57 Did it like search for ties or like a collar?

43:59 Maybe throw on like a hoodie?

44:01 Exactly.

44:03 Like put on a hoodie.

44:04 I found a photo of myself without a beard.

44:06 I don't, this is like super vain.

44:07 I was like, wait, I can't live with this.

44:09 So I found a photo, an old photo of myself without a beard.

44:11 I put it into the system and it guessed that I was like 37, which is also older than I am.

44:17 It's better though.

44:17 And so I've had to accept.

44:18 It's moving in the right direction, but I've had to accept that either, either.

44:23 It's just like everybody got offended because like the, if, like if you were Microsoft and you were training this algorithm, like wouldn't you want it to skew young?

44:30 Like, let's be honest.

44:31 Yeah.

44:32 Okay.

44:33 But let's say, okay, so it skews old.

44:36 Or, or I'm just old looking.

44:38 Like maybe I'm aged beyond my years.

44:40 I have a kid that, you know, this, the kids do this to you.

44:42 I bet, I bet it would have guessed that I was like 25 until the day she was born.

44:46 And then it was like 40.

44:47 One week later, you aged like 10 years like that.

44:51 Yeah, exactly.

44:53 Exactly.

44:53 So number nine on our list here is actually a little bit of a sad story.

44:59 There was this really cool study about how, how much a reasonable argument with some, not argument as fight, but like a logical argument or a discussion around a political position may or may not change someone's opinion.

45:15 And so this, this study was done trying to change people's opinion, open, open them up being more favorable towards same sex marriage.

45:25 And they found in this study that if they went around and they had people canvas the location, go and knock on the door and talk to people after talking to them, they could actually make them more open to this idea.

45:39 Which this flew in the face of a lot of political science, which is if you argue with somebody from an opposite perspective on something political, they typically dig in and like, ha, I'm way against you, man.

45:50 Right.

45:50 It like hardens them against your argument rather than brings them over.

45:53 So this was a big deal until it was a fraud.

45:56 Yeah.

45:58 Yeah, exactly.

45:59 And, and, and I think everybody was, I mean, it was one of those things where it was like, okay, because I think part of the, one of the big parts of the study was that, or, you know, purportedly.

46:09 Turned out not to be true.

46:10 Was that if the person who was having a discussion with you about your views on same sex marriage was themselves gay, then that would increase the likelihood that you would, your opinion would change.

46:23 So in a good way, like, or toward, you would be more favorable of same sex marriage if you were talking to a gay person.

46:30 Basically saying that, like, if you, this idea that the reason that most of us might hold a prejudice are simply because we're not exposed to people of a certain group.

46:40 And as soon as we are, our prejudices start to melt away, which is kind of a nice idea.

46:46 Right.

46:46 And so everybody was really excited about this survey.

46:48 But it, well, I mean, like we're talking about, it turns out that it was just made up.

46:53 And what's also interesting is the way in which some additional researchers discovered the flaw.

47:01 They were just, as often happens, they were going to do an extension of the survey.

47:05 They were going to build on the research that had already been published, which is pretty common.

47:11 But then when they looked into it, they discovered that the responses in the survey were pretty normal, like pretty consistent, which is actually like they were too consistent.

47:20 Right.

47:21 The pattern was too clean.

47:22 And what they actually found was that there were like irregularities in the data and the irregularities that they found in the data looked like the sort of thing that a human would do if they were trying to be random.

47:35 Like, isn't that amazing?

47:36 Like, so like actual randomness is hard to produce and like human beings just aren't good at producing things randomly because that's just not what we do.

47:42 Like we go like, oh, I picked a three last time.

47:44 So I have to pick an eight this time.

47:45 And I've already picked an eight.

47:47 So randomness says that I wouldn't pick an eight two times in a row.

47:49 I should pick a 12.

47:50 And that's actually not how randomness works.

47:52 So it's like that's super intentional.

47:55 Like you're layering your own bias on top of it.

47:57 And so what they saw was like very uniform noise in the data set.

48:00 And they were like, wait a second.

48:02 This actually looks like you just made it up.

48:04 And it's true.

48:05 Like I guess he took the results from a previous study and then tried to apply them to this one.

48:08 Like and it basically applied the previous study's results in a new context and then sprinkled some noise on top in a way that he felt would look random.

48:15 And these other researchers basically had to say this like huge study, this like seminal study that had been published in, you know, I forget which, you know, nature maybe or whatever.

48:25 Like one of them.

48:26 And it was covered in a ton of like high profile news outlets too, right?

48:33 New York Times and places like that.

48:36 So it got a lot of traction until it crashed.

48:39 Yeah.

48:40 Yeah, absolutely.

48:41 And so, I mean, on so many levels, it's like it's really – I mean, on the one hand, you look at this and you go, oh, what a horrible fraud this person perpetuated on, you know, the public.

48:50 And on the other, you go, well, I guess, you know, the process kind of works.

48:53 I mean, it shouldn't have been published in the first place.

48:55 Peer review should have caught that.

48:56 But ultimately, the academic community discovered the fraud and out of it, which I think was good.

49:02 But it was – so it's interesting in two ways.

49:04 It's just interesting how research, the kind of the mechanics of the research industry work.

49:08 And then on the other hand, it was interesting because it was this really – the thing that – this kind of novel idea about randomness that's not normally part of mainstream discussion that everybody had to start talking about in order to explain why the story worked out the way that it did.

49:22 Yeah.

49:23 How did they catch him making fake random?

49:24 I don't understand.

49:25 What does that mean?

49:26 Yeah.

49:27 That's really cool.

49:28 Yeah, yeah, exactly.

49:28 What does that mean?

49:29 Like, yeah, it was cool.

49:31 That's great.

49:32 And on this note, you know, I was seeing this story in a really positive light.

49:38 Like, hey, maybe we could sit down and talk to each other and we could, like, help evolve each other's opinions one way or another.

49:43 And we could kind of come to an understanding.

49:45 But it turns out that, like, no, probably we can't.

49:49 There's a really interesting book that I think is related to, like, data science.

49:54 It just has an insane amount of data analysis in it called The Big Sort, how the clustering of like-minded America is tearing us apart.

50:02 And it really just goes to, like, 20, 40 years of data of, like, people's opinions and working together.

50:10 And it sort of would not support the study that we can just talk about things and agree more.

50:16 But check it out.

50:18 That's interesting.

50:19 I'll have to go check out the book.

50:20 Yeah, it's an amazing book.

50:21 One of my favorites.

50:22 So, number 10, which I'm going to nominate to be my favorite of this year, is that Python is at an all-time high as a programming language.

50:35 So, the TOB index is one of the more respected sort of how popular is my technology and the technology fight indexes.

50:43 And Python is now number four of all programming languages.

50:48 And it jumped from eighth to fourth in one year.

50:52 So, it's one of the very few with, like, a double up arrow.

50:55 This thing is super growing fast.

50:57 So, for a language that was created in the late 80s, came out early 90s, and it's been around for a long time, this massive jump, you know, I think it's interesting to ask, like, where is it coming from?

51:09 It's coming from academics somewhat.

51:11 Like, this Python is now the most popular language for first-year college students studying computer science.

51:17 But I think it has a lot to do with data science as well.

51:22 Yeah, I mean, it's hard to see how it couldn't be, actually.

51:25 Because I think Python's always been great as kind of along the same lines as PHP, which I can only assume for all the software engineers who listen to this podcast because they like Python so much, I've pretty much just been sacrilegious.

51:37 So, don't get me wrong, you guys.

51:38 I hate PHP.

51:39 Those guys suck.

51:40 But also, Python is a pretty good general-purpose programming language, as is PHP, for building web applications and all of the, you know, all of the sorts of products that we've seen being released on the internet in the past, whatever, 10 years.

51:54 This huge boom in web-based products.

51:57 And Python is great for that.

51:58 It's great for writing software that gets released on the internet.

52:02 But at the same time, it's also overtaken another statistical programming language called R and become, I would say, the de facto language for doing data science and analysis.

52:13 And which is really cool because now it's a powerhouse.

52:16 Now you can do two things in the same language that used to be totally separate from one another.

52:20 So, we talked about before, there's statistical computing, which is basically like, I already know how to do the stats, but I need to script it.

52:28 And what language will help me do that, you know, almost like an Excel power user.

52:32 And then that goes all the way down through kind of, you know, complicated machine learning and artificial intelligence and neural networks and all that stuff.

52:40 And so, the fact that you can couple that with how do I respond to an HTTP request and hit a database and return some content, like, those two worlds used to be different.

52:49 But now we can build smarter and smarter applications that are seamlessly integrated with one another thanks to Python.

52:55 So, I mean, it's no surprise to me.

52:57 It's awesome.

52:58 Yeah, I think there must be a huge boost coming from that direction.

53:00 And generally, people are really jumping into it.

53:03 But, you know, thanks to data science for making our language look even more popular and awesome like it should be, right?

53:10 I know.

53:12 The only languages I think above it on the list are, like, really lame ones that nobody wants to write in.

53:16 They're probably, like, forced to by their boss.

53:18 Like, Java.

53:19 I mean, come on.

53:20 Yeah.

53:20 Let's be real.

53:21 Oddly, yeah.

53:22 A little tear just formed in my eye.

53:26 No, but seriously, like, Java literally is number one.

53:29 And C is up there as well, I believe.

53:30 It's pretty interesting.

53:32 But it's, you know, C and Java are fighting for first place.

53:36 Then it's C++.

53:38 And then it's Python.

53:39 Right?

53:39 So, that's beautiful.

53:41 Yeah.

53:42 I mean, honestly, kind of any dynamic language being as popular as that.

53:46 I think there are a lot of old school software engineers who are crying in their coffee at the release of this report.

53:52 You know, like all of that.

53:53 That's a toy language that will never be useful for anything in production.

53:56 And, yeah, now here we are.

54:00 Python's running some pretty massive, massive products.

54:04 So, it's very cool.

54:06 As I'm sure all your listeners know, they listen to your podcast every week.

54:09 Yeah, I'm sure a lot of them are building some pretty amazing stuff out there.

54:12 But, yeah.

54:14 I think this is great news for everybody who wants to get into Python.

54:18 It seems like the job possibilities, job prospects in regular programming as well as data science are just going up.

54:25 So, if I were betting on my career, I would consider Python one of the top choices.

54:31 Yeah, absolutely.

54:32 And it makes it so much easier to start to explore new concepts.

54:36 So, when you want to start, when you're ready to start bringing in kind of machine learning or any kind of statistical, you know, data processing into your applications, it's complicated.

54:49 But it's a lot less complicated when you're already familiar with the language that you're writing in.

54:54 So, I think it really gives you a leg up when you're trying to make that transition from one to the other.

54:58 Or vice versa.

54:59 If you've been, you know, for the partially derivative listeners, if you've been sort of scripting a lot in Python to try out, to do some research or to work on some analysis, and you want to start building applications to make the things that you're, make the products that you're building accessible to the world, there's a whole ecosystem waiting for you.

55:18 And the water is warm, my friends.

55:19 The water is warm and shallow.

55:21 Wade right in.

55:22 Yeah, exactly.

55:24 So, yeah, it's fun.

55:26 And it's just, it's a fun language to program in.

55:28 Let's be honest.

55:29 Yeah, absolutely.

55:30 Love it.

55:30 So, Jonathan, that's our top 10 list for the big news in data science this year.

55:35 So, I got to ask you, what is your resolution you're going to break this year?

55:40 You're going to give up on in two weeks.

55:42 Have you already decided what you're going to not hold up?

55:44 I already decided what I'm not going to do.

55:47 That's right.

55:48 Let's see.

55:49 The other people on my engineering team would probably appreciate it if my New Year's resolution was to write more unit tests.

55:56 Or better comment my code.

56:04 And I'm almost certain that neither of those things will happen.

56:06 So, let's call, I'll make, I'll do a dual resolution this year.

56:10 Unit tests and documentation.

56:12 That's a pretty safe one to throw out there, I suspect.

56:16 I think I'm going to make mine to actually put proper comments on my get check-ins.

56:21 Yeah.

56:22 Those could be some improvements.

56:24 Yeah.

56:24 Kind of get in a hurry and they're not so good sometimes.

56:28 Yeah.

56:29 Totally.

56:30 Like fixing stupid bug.

56:31 Yeah.

56:32 Like that's, I think, that's probably not as helpful as it could be to my fellow programmers.

56:36 It does express your emotional feel about the code in the check-in, but it doesn't really help them figure out what it meant.

56:42 Yeah.

56:45 That's fair.

56:46 That's fair.

56:46 That's a good resolution.

56:47 But you don't think you'll stick to it?

56:49 I mean, it's a hard habit to break.

56:51 I'm going to stick to it until I get to a super big hurry and there's some production bug to fix.

56:54 And I'm going to like probably skip it.

56:57 Not that I think that's a good idea.

56:58 I'm not recommending it.

57:00 I'm just telling you, these are the resolutions you make and you break.

57:02 Yeah.

57:04 I love it.

57:04 Like this is the recommendation.

57:06 So all of you out there who are learning Python and software programming, engineering for the first time, here's what to do.

57:12 Talk a big game and then ultimately just write a bunch of spaghetti code that nobody can read.

57:18 That's how the pros do it, you guys.

57:20 That's how the pros do it.

57:21 On that lovely note, I think we should probably call it a show.

57:26 Jonathan, this has been really fun.

57:28 Thanks for teaming up to put together an end of the year show for us.

57:31 Absolutely.

57:32 This has been a blast.

57:33 Thank you so much.

57:34 Thank you so much for suggesting it and having me on the show.

57:37 This has been a blast.

57:38 You bet.

57:39 Thanks.

57:39 Bye.

57:40 Bye.

57:40 This has been another episode of Talk Python To Me.

57:44 It's a joint episode with Partially Derivative and Jonathan Morgan.

57:48 And it has been sponsored by Hired and DigitalOcean.

57:51 Thank you guys for supporting the show.

57:53 Hired wants to help you find your next big thing.

57:55 Visit Hired.com slash Talk Python To Me to get five or more offers with salary and equity

58:00 presented right up front and a special listener signing bonus of $4,000.

58:04 DigitalOcean is amazing hosting blended with simplicity and crazy affordability.

58:10 Create an account and within 60 seconds, you can have a Linux server with a 30 gig SSD at

58:16 your command.

58:17 Seriously, I do this all the time.

58:18 Remember the discount code.

58:20 It's Talk Python, all caps, no spaces.

58:22 You can find the links from today's show at talkpython.fm/episode slash show slash

58:28 40.

58:28 And be sure to subscribe to the show.

58:30 Open your favorite podcatcher and search for Python.

58:33 We should be right at the top.

58:34 You can also find the iTunes and direct RSS feeds in the footer of the website.

58:39 Our theme music is Developers, Developers, Developers by Corey Smith, who goes by Smix.

58:43 You can hear the entire song on talkpython.fm.

58:46 Just look for music in the nav bar.

58:48 This is your host, Michael Kennedy.

58:50 I really appreciate you taking the time to listen and share this with your friends.

58:53 Smix, take us out of here.

58:56 I'm out of here.

59:17 Bye.

59:17 you you