WEBVTT

00:00:00.001 --> 00:00:04.900
If there's ever been a time in history that journalism is needed to shine a light on what's

00:00:04.900 --> 00:00:09.840
happening in the world, it's now. Would it surprise you to hear that Python and machine learning are

00:00:09.840 --> 00:00:14.320
playing an increasingly important role in discovering and bringing us the news? On this

00:00:14.320 --> 00:00:18.660
episode, you'll meet Carolyn Stransky, a journalist and developer who's been researching this

00:00:18.660 --> 00:00:25.280
intersection of tech and journalism. This is Talk Python To Me, episode 280, recorded August 26,

00:00:25.280 --> 00:00:26.180
2020.

00:00:26.180 --> 00:00:44.560
Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the

00:00:44.560 --> 00:00:49.320
ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where

00:00:49.320 --> 00:00:54.680
I'm at mkennedy. Keep up with the show and listen to past episodes at talkpython.fm and follow the

00:00:54.680 --> 00:01:01.100
show on Twitter via at talkpython. This episode is brought to you by brilliant.org and us. Before

00:01:01.100 --> 00:01:05.660
we talk with Carolyn, a quick announcement. Two of the courses that we've released in early access

00:01:05.660 --> 00:01:11.620
mode are now complete, 100% done, and they're ready for you. That's the Python memory management and

00:01:11.620 --> 00:01:18.520
tips course and moving from Excel to Python with pandas and Jupyter. Just visit talkpython.fm and

00:01:18.520 --> 00:01:24.180
click on Python courses to learn more. Carolyn, welcome to Talk Python To Me.

00:01:24.180 --> 00:01:26.300
Thank you. Thank you for having me.

00:01:26.300 --> 00:01:32.420
I'm really interested to hear about this topic. I ran across one of your presentations talking about

00:01:32.420 --> 00:01:40.080
how AI is affecting journalism, and I just was really fascinated by all of these ways that newspapers and

00:01:40.080 --> 00:01:45.460
journalists are doing really cool stuff with like machine learning and AI, and oftentimes that means

00:01:45.460 --> 00:01:52.660
Python as well. Now, before we get into our main topic with AI and journalism, let's just start with

00:01:52.660 --> 00:01:57.440
your story. How did you get interested in programming? You started out on the journalism side, not the AI

00:01:57.440 --> 00:02:05.720
side, right? Yes, exactly. So I studied journalism in university, and that was my focus. My focus was

00:02:05.720 --> 00:02:11.720
actually print journalism. I really thought that was going to be... I mean, okay, I knew it wasn't going to be the

00:02:11.720 --> 00:02:17.820
future, but I really... my skills were in writing, so I thought, okay, newspapers, solid. They're not going

00:02:17.820 --> 00:02:26.120
anywhere. It's fine. So that was kind of my specialty in a way. I started out in sports journalism and then

00:02:26.120 --> 00:02:32.620
kind of moved around after I graduated. Right after I graduated, I moved to Berlin, and I realized that

00:02:32.620 --> 00:02:39.720
being a journalist is very hard, and being a journalist in a country that you don't speak the language is even

00:02:39.720 --> 00:02:47.560
more difficult. Yeah. So yeah, it took me a while, but I figured that out. And so then I started covering

00:02:47.560 --> 00:02:53.380
things like when I first moved to Berlin, it was the refugee crisis or in the midst of the refugee crisis

00:02:53.380 --> 00:03:00.840
in 2015. So I was able to get a few freelance pieces covering that. I also was able to do more

00:03:00.840 --> 00:03:09.360
tech-related. So I started doing a bit of activism fused with tech articles, things like anti-harassment

00:03:09.360 --> 00:03:17.140
tools, like two-way sex toys, things like that. So because in Berlin, there's a big... Yeah, in Berlin,

00:03:17.140 --> 00:03:23.760
there's a big tech and startup scene here. So that was like an English-speaking community that I had access

00:03:23.760 --> 00:03:31.200
to. And eventually, I needed to get a full-time job to keep my visa. So I went into... This is long,

00:03:31.200 --> 00:03:36.920
but I went into tech marketing, went to technical writing, then learned to code. Now I'm a developer.

00:03:36.920 --> 00:03:41.100
What an interesting journey. I think a lot of people find their way into working with software

00:03:41.100 --> 00:03:45.800
in a roundabout way like that. Like for me, it was, oh, I'm going to go study chemistry and math

00:03:45.800 --> 00:03:50.600
at college. And I guess I got to learn a little programming so that I can do the math work and the

00:03:50.600 --> 00:03:54.380
math research. And wait a minute, I actually like this better than the math. What am I doing here?

00:03:54.380 --> 00:03:58.780
That was me as a technical writer because I was writing these tutorials, but I didn't really know

00:03:58.780 --> 00:04:04.980
how to code. So then I would just hand off... It would be like, put code here. I'd hand it to a

00:04:04.980 --> 00:04:08.860
developer. And finally, I was like, I can do that. Write code.

00:04:08.860 --> 00:04:12.820
That's awesome. Yeah. And you're having fun doing programming these days?

00:04:12.820 --> 00:04:18.380
I mean, it's a great way to make a living. And I think the problem-solving aspect of it,

00:04:18.380 --> 00:04:22.920
but I do, I really miss journalism. Like I would be lying. That's why I like to research

00:04:22.920 --> 00:04:26.280
topics like this because it helps me feel a bit more connected.

00:04:26.280 --> 00:04:32.800
Stay connected. Well, I do think that there's a lot of ways in which journalists can use tech

00:04:32.800 --> 00:04:38.640
or could be helped by folks with tech skills. So who knows? We may find you back in the journalism

00:04:38.640 --> 00:04:41.180
space on the tech side of the desk.

00:04:41.180 --> 00:04:42.200
That's the dream.

00:04:42.200 --> 00:04:44.020
Yeah. Awesome. Awesome.

00:04:44.240 --> 00:04:47.620
So what are you doing these days, like day to day right now?

00:04:47.620 --> 00:04:53.460
So right now I'm actually dipping back into technical writing a little bit. So I'm doing

00:04:53.460 --> 00:04:58.340
the Google season of docs, which is a three month program from Google where you're partnered

00:04:58.340 --> 00:05:04.600
with an open source organization. So I'm working with the GraphQL foundation and yeah. So not

00:05:04.600 --> 00:05:11.380
really using a spoiler alert, not really using Python day to day, not really a proficient Python

00:05:11.380 --> 00:05:18.020
developer. But in my previous job, I was using Python because we were an automated testing service.

00:05:18.020 --> 00:05:24.100
And a lot of our, you know, suites were written in Python. We had a data team, a data science team

00:05:24.100 --> 00:05:30.100
that primarily written Python, anything that wasn't written in Python, we usually had some sort of like

00:05:30.100 --> 00:05:33.180
Porter so that people could write it in Python.

00:05:33.680 --> 00:05:34.400
Right. Exactly.

00:05:34.400 --> 00:05:38.180
You know, compiled into PureScript or something like that. So.

00:05:38.180 --> 00:05:46.120
Yeah. Very, very cool. Let's start pre AI talking about journalism and just talk about data and

00:05:46.120 --> 00:05:50.320
journalism. Now, I feel like these things have always gone together. You know, if you went back to

00:05:50.320 --> 00:05:55.920
like 1920 and you grabbed a newspaper, it would probably have stuff about the stock market and here

00:05:55.920 --> 00:06:04.640
the trends and whatnot. But accessing data has become much, much easier in the last 10 years or whatever.

00:06:04.640 --> 00:06:09.760
When we have web scraping, we have APIs, we have all these different ways of accessing data, right?

00:06:09.760 --> 00:06:16.460
The internet was massive in that regard and so on. So when I think of data internalism, probably the

00:06:16.460 --> 00:06:24.200
first place that I think of is 538.com. Like that place just has so many GitHub repositories of all

00:06:24.200 --> 00:06:28.040
the data that they use and that you go there and there's all these graphs and stuff. But you know,

00:06:28.040 --> 00:06:33.200
maybe just give us a sense of like where you see data having an impact in journalism these days.

00:06:33.200 --> 00:06:38.060
You know, it's funny that you jump right away to GitHub and these data sets, because for me,

00:06:38.060 --> 00:06:42.500
I think about it and maybe it's because, you know, I studied it. And when you study something in

00:06:42.500 --> 00:06:49.700
university, you get all of the philosophy behind it. Right. Yeah. But, but I think about data and

00:06:49.700 --> 00:06:56.900
journalism, as it's always been a really integral part of journalism, like most really good quality

00:06:56.900 --> 00:07:02.540
reporting has an element of data to it. I think about things like I mentioned, I used to do sports

00:07:02.540 --> 00:07:07.940
reporting. And you think about a story like that. And would you rather read something that says,

00:07:08.080 --> 00:07:14.720
the teams played well, this person did pretty okay? Like, seems like last time we think. Yes,

00:07:14.720 --> 00:07:20.320
exactly. Or would you rather like read an article that really breaks down the statistics, you know,

00:07:20.320 --> 00:07:25.680
what the score was, what the batting average was, I don't know why I'm using baseball, but

00:07:25.680 --> 00:07:32.860
so this idea of data and especially like really well researched and well curated data,

00:07:33.080 --> 00:07:39.740
it's great because it can help like, do things like fight misinformation and it can help. There's

00:07:39.740 --> 00:07:46.320
a really great quote from Catherine Gashiro from, she's from the International Center for Journalists

00:07:46.320 --> 00:07:52.860
Knight fellow. And yeah, she said that like data can help journalists speak truth to power. And I love

00:07:52.860 --> 00:07:59.540
that because I think when you have data in reporting, it's instantly more trustworthy. We can dive into the

00:07:59.540 --> 00:08:04.740
ethics of that and whether or not that data is actually trustworthy later, but it gives people

00:08:04.740 --> 00:08:10.780
that sense of practice. It's probably more trustworthy than just, it's my opinion, or here's an antidote.

00:08:10.780 --> 00:08:15.600
I heard on Facebook that somebody said that this was true. So here's my, you know, things like that,

00:08:15.600 --> 00:08:23.000
right? Definitely. You know, one of the big ironies it feels like to me is with the, all this availability of

00:08:23.000 --> 00:08:32.300
real data we have, it seems like a proliferation of just insanity around fake data that I don't know,

00:08:32.300 --> 00:08:37.860
maybe it's because it's also on the internet and it appears like, well, here's some piece of information

00:08:37.860 --> 00:08:41.860
and we'll call those facts or opinions. And here's another one that's different than that, but they're

00:08:41.860 --> 00:08:46.880
both on the same webpage, right? They're both in the browser. And like that kind of puts on equal footing

00:08:46.880 --> 00:08:51.020
as opposed to, well, that used to be on the front page of the New York times and the front page of the

00:08:51.200 --> 00:08:57.100
require, like I didn't consider those to be equally weighted sources, but maybe people are just not

00:08:57.100 --> 00:09:03.460
distinguishing them. But it seems ironic to me that we have more access to real data and it, there's also

00:09:03.460 --> 00:09:09.320
seems to be a lack of embracing of real data or whatever. I'm not trying to put that, but people

00:09:09.320 --> 00:09:14.340
seem a little wacky. Yeah. I think there is, it's confusing. I mean, I also think it's confusing.

00:09:14.340 --> 00:09:20.080
We're living in this world where someone can be, you know, blogging on their own site and that can be

00:09:20.080 --> 00:09:28.820
almost more credible journalism than, you know, someone in certain news outlets. So we don't need

00:09:28.820 --> 00:09:35.060
to get super political in this, but it is. And, but yeah, it's, I think it's really confusing and I

00:09:35.060 --> 00:09:41.800
don't blame people who especially don't, might not understand how the data is collected or, you know,

00:09:41.800 --> 00:09:46.800
this, a lot of people aren't very transparent about how it's being presented or collected. And

00:09:46.800 --> 00:09:48.860
yeah, it's confusing.

00:09:48.860 --> 00:09:54.140
It's definitely confusing, but it's also the foundation of the real journalism, the real

00:09:54.140 --> 00:09:58.960
reporting. I think there's a lot of interesting ways in which you use it. And we're going to talk

00:09:58.960 --> 00:10:04.220
about some actually concrete, cool tools that a lot of newspapers are using, but you know, maybe let's

00:10:04.220 --> 00:10:10.800
just talk about why would journalists and newspaper journalists, freelance journalists,

00:10:11.460 --> 00:10:18.080
associated press type things and so on. Like why would those folks use AI and ML? Like why are they

00:10:18.080 --> 00:10:18.800
adopting these tools?

00:10:18.800 --> 00:10:25.200
There are a lot of reasons. And just so you don't have to hear it from me, there was a recent survey

00:10:25.200 --> 00:10:32.200
done by journalism AI, which is from Google and Polis, which is from the London School of Economics and

00:10:32.200 --> 00:10:38.000
Political Science. They have a think tank on this and they surveyed newsrooms across the nation. You'll

00:10:38.000 --> 00:10:43.620
probably, or sorry, across the world. And you will hear me reference this survey a lot because it's

00:10:43.620 --> 00:10:48.540
really thorough and it's really recent. They published it at the end of last year. And they

00:10:48.540 --> 00:10:55.940
mentioned that there were like three key motives for using AI. So the first was to make journalists

00:10:55.940 --> 00:11:04.100
work more efficient, about like 68% of the replies said that. Then to deliver more relevant content to

00:11:04.100 --> 00:11:10.960
users, about half of the respondents said that, and also to improve business efficiency. So what I've

00:11:10.960 --> 00:11:19.280
mostly focused on is that element of making journalists work more efficient, because there is so much that you do

00:11:19.280 --> 00:11:25.860
in journalism that can be automated. And you know, we're developers, we love to automate things.

00:11:25.860 --> 00:11:32.220
Once you realize you're like, I don't have to do this for four hours by hand. Okay, we're not doing this for

00:11:32.220 --> 00:11:36.000
four hours by hand. Let me do it once for two hours, and we'll never do it again, right?

00:11:36.400 --> 00:11:41.940
Exactly. So there's so many opportunities like that, especially because traditional print journalism is

00:11:41.940 --> 00:11:47.560
such, you know, a very thorough, very logical, but, you know, a bit slow moving in that sense. And

00:11:47.560 --> 00:11:52.960
especially with this quicker news cycle, you need to keep up. So there are things like, you know, being

00:11:52.960 --> 00:11:58.580
able to retrieve one of those massive data sets and comb through it and see whether or not there's a story

00:11:58.580 --> 00:12:05.380
in there. Fact checking basic articles, maybe organizing story ideas that are facilitated from

00:12:05.380 --> 00:12:12.180
the public, making initial rough cuts of videos or deciding what camera angle is the best. And there's

00:12:12.180 --> 00:12:17.540
just a lot of tedious tasks. Yeah, I used to be an unpaid intern. I know what those tasks are, because I had

00:12:17.540 --> 00:12:25.620
to do them. And I think it's great, because, and I think the reason I focused on that, is that it's really

00:12:25.620 --> 00:12:31.540
supplementary to human journalists. Right? Yeah, none of the stuff that you said, people would want to

00:12:31.540 --> 00:12:38.960
defend as like, that AI is taking my job. Like, I used to go by hand, and rename these columns to that

00:12:38.960 --> 00:12:43.700
columns, and then merge it over here. And I want to keep doing like nobody wants to keep doing that,

00:12:43.700 --> 00:12:49.140
right? They want to do it to tell the stories to find the insights to do the research, not juggle and

00:12:49.140 --> 00:12:55.380
wrangle data, or other things along those lines, right? Exactly. But I think when a lot of journalists are

00:12:55.380 --> 00:13:01.140
being presented the idea of AI and machine learning, they're not presented, like, no one says exactly

00:13:01.140 --> 00:13:04.540
what you said, they just say, Oh, we're going to introduce this new bot, we're going to introduce

00:13:04.540 --> 00:13:12.220
this new tool. And I think people immediately are very afraid, you know, very unsure about, you know,

00:13:12.220 --> 00:13:18.480
you hear about robots are going to take all of our jobs. And so I think they get a bit uneasy in that

00:13:18.480 --> 00:13:23.680
sense. And I don't blame them, because if it's not being explained in a way that like, Oh, this is

00:13:23.680 --> 00:13:28.800
supplementary to your work. And that's what most of the respondents in the survey even said, they said,

00:13:28.800 --> 00:13:36.840
like, they see AI as something that is, you know, supplementary and additional, not necessarily like

00:13:36.840 --> 00:13:38.580
transformational yet.

00:13:40.840 --> 00:13:45.680
This portion of Talk Python To Me is brought to you by Brilliant.org. Brilliant has digestible courses

00:13:45.680 --> 00:13:51.040
in topics from the basics of scientific thinking all the way up to high end science like quantum

00:13:51.040 --> 00:13:56.040
computing. And while quantum computing may sound complicated, Brilliant makes complex learning

00:13:56.040 --> 00:14:00.700
uncomplicated and fun. It's super easy to get started. And they've got so many science and math

00:14:00.700 --> 00:14:05.420
courses to choose from. I recently used Brilliant to get into rocket science for an upcoming episode.

00:14:05.420 --> 00:14:10.580
And it was a blast. The interactive courses are presented in a clean and accessible way. And you

00:14:10.580 --> 00:14:16.020
could go from knowing nothing about a topic to having a deep understanding. Put your spare time to good

00:14:16.020 --> 00:14:21.700
use and hugely improve your critical thinking skills. Go to talkpython.fm/brilliant and sign up for

00:14:21.700 --> 00:14:28.580
free. The first 200 people that use that link get 20% off the premium subscription. That's talkpython.fm

00:14:28.580 --> 00:14:31.800
slash brilliant. Or just click the link in the show notes.

00:14:31.800 --> 00:14:40.740
Who knows where the future goes? I've never ceased to be amazed by how crazy some of the things people

00:14:40.740 --> 00:14:45.300
are coming up with. Like the fact that we have self-driving cars, that seemed like pure science

00:14:45.300 --> 00:14:52.100
fiction, you know? So who knows? Maybe we'll get creepy AIs writing stuff. But for now, I don't see it

00:14:52.100 --> 00:14:57.780
that way. And I remember working at a company quite a while ago where it was not a very tech heavy company.

00:14:57.860 --> 00:15:02.840
It was like a research place, but a lot of people were researchers, not developers. So there were not a

00:15:02.840 --> 00:15:07.900
lot of automated systems. And every time we would say, you know, that thing used to do for like four

00:15:07.900 --> 00:15:13.280
hours a week, we just made that automatic and it just happens now. They're like, ah, this is like,

00:15:13.280 --> 00:15:17.720
that's what I used to do. This is going to be my job. But every single time they just got more

00:15:17.720 --> 00:15:23.100
interesting work that was less tedious, you know? And I don't remember anyone getting laid off or

00:15:23.100 --> 00:15:27.200
anything like that because we automated stuff. It's just, we could do more work and do more

00:15:27.200 --> 00:15:30.920
interesting work. Exactly. I think there'll always need to be that human component, especially

00:15:30.920 --> 00:15:36.700
in journalism because it's so based on things like storytelling and, you know, conveying emotion.

00:15:36.700 --> 00:15:42.920
I mean, not every form of journalism, but a lot of the really human centric things focus on that.

00:15:42.920 --> 00:15:44.680
And I think you'll always need someone.

00:15:44.680 --> 00:15:48.720
Yeah. Well, a really interesting semi-recent, I'm going to call it recent story,

00:15:49.180 --> 00:15:56.780
was people applying like text processing and machine learning to the data that came out of the Panama

00:15:56.780 --> 00:15:59.140
papers, right? So I don't know much about it.

00:15:59.140 --> 00:16:05.660
Yeah, I didn't either. But I recently interviewed a guy who worked on a project that was a search engine

00:16:05.660 --> 00:16:11.080
type of thing. And as they were working on it, there was a journalist who was super interested

00:16:11.080 --> 00:16:16.060
in the tech. They're like, why is this journalist so interested in this? And it turned out that they were

00:16:16.060 --> 00:16:22.780
using it to analyze and search all the data, like do OCR and types of stuff like that on the data that

00:16:22.780 --> 00:16:27.280
came out of the Panama papers to create relationships and see what, you know, like that kind of stuff

00:16:27.280 --> 00:16:32.520
would have taken a lot more people, maybe would have gotten exposed before the Panama papers could have

00:16:32.520 --> 00:16:37.580
been, you know, analyzed fully, all sorts of stuff. So yeah, a lot of interesting ways to use technology,

00:16:37.580 --> 00:16:37.900
I think.

00:16:37.900 --> 00:16:38.360
Absolutely.

00:16:38.820 --> 00:16:43.740
All these are interesting. But honestly, I think maybe the biggest boost actually might be just the

00:16:43.740 --> 00:16:48.360
automate the boring stuff story, which you were touching on is like, there's all these things

00:16:48.360 --> 00:16:53.300
you got to do, a lot of them are tedious. And if you had a little programming skill, you could automate

00:16:53.300 --> 00:16:58.300
them. But because you don't, you just maybe use Excel and find and replace or something painful.

00:16:58.960 --> 00:17:03.640
And if you could just take like 20% of the tedious work away from journalists and let them

00:17:03.640 --> 00:17:07.760
focus on the story, or getting out and getting the story, like that would be great.

00:17:07.760 --> 00:17:14.120
Exactly. I mean, then I think you can focus on all of those aspects that a computer cannot.

00:17:14.120 --> 00:17:16.780
Like what you just said, I literally just repeated what you said.

00:17:16.780 --> 00:17:19.580
Obviously, the computer can't go and interview people

00:17:19.580 --> 00:17:23.440
in a way that's going to ask the right questions and so on. Right?

00:17:23.560 --> 00:17:29.480
Exactly. So in your talk, you mentioned a bunch of newsrooms that are actually using

00:17:29.480 --> 00:17:35.400
different software applications, libraries, and so on. Some of these are open source. Some of these

00:17:35.400 --> 00:17:40.140
are just talked about, but I thought they were fascinating. So you want to go through some of

00:17:40.140 --> 00:17:40.320
those?

00:17:40.320 --> 00:17:41.400
Yeah, definitely.

00:17:41.400 --> 00:17:44.000
All right, let's start with the Washington Post and Heliograph.

00:17:44.000 --> 00:17:51.160
Heliograph is a bot from Washington Post that produces content for them. I mean, does a lot of other

00:17:51.160 --> 00:17:55.520
things, you know, like tweets, its own code snippets, identifies trends in the stock market.

00:17:55.520 --> 00:18:01.560
But at least as someone from the outside, it seems like that's the value that it brings is that it

00:18:01.560 --> 00:18:08.920
just rapidly generates articles. Like the first year it came out, it made over 800 articles in a single

00:18:08.920 --> 00:18:15.840
year. Yeah, exactly. And it's really good for at least from what I've seen is that it's really good

00:18:15.840 --> 00:18:21.920
for those kind of short, quick reports. So things like covering the Olympic Games, like in Rio, it was

00:18:21.920 --> 00:18:29.760
able to just pop out like 300 articles. Then for things like politics in the election, that's I think

00:18:29.760 --> 00:18:37.920
the main focus of it. They won an award for it in 2016, or for the coverage in 2016. But just that volume,

00:18:38.280 --> 00:18:45.960
that is, I mean, I don't know a human who can write, what is it, like five articles a day? I have no idea.

00:18:45.960 --> 00:18:51.180
Yeah, the turnaround time as well, right? Like, it seems to me like this kind of thing would be good

00:18:51.180 --> 00:18:57.760
for, oh, there's a big crash during rush hour that is shut down I-5 North. Details coming. Or

00:18:57.760 --> 00:19:02.520
something like, you know, just like those really short little, this thing has happened. We're going to

00:19:02.520 --> 00:19:06.720
write a story eventually, but we want to get it out there because we know it's a timely sort of thing,

00:19:06.780 --> 00:19:11.860
right? Definitely. And I think it also ties into the relational issue you're talking about with the

00:19:11.860 --> 00:19:17.060
Panama Papers. I think that's where, based on what I've read, where they're kind of going with

00:19:17.060 --> 00:19:23.360
heliograph, especially in politics, because, you know, a human reporter, there are tens of thousands

00:19:23.360 --> 00:19:27.900
of elections going on throughout the year, especially just in the US I'm talking about.

00:19:27.900 --> 00:19:28.240
Yeah.

00:19:28.380 --> 00:19:35.180
And let alone the entire world. And so being able to have something that can monitor all of those,

00:19:35.180 --> 00:19:41.400
and perhaps maybe even eventually find relationships between those, I think is really exciting.

00:19:41.920 --> 00:19:50.200
Yeah, absolutely. Also just alerts on like emerging trends, like this thing seems to be getting talked

00:19:50.200 --> 00:19:54.760
about on Twitter and we've pulled it out and maybe this hashtag is now all of a sudden trending.

00:19:54.760 --> 00:19:59.760
Hey, reporters, look at this and see if this is interesting. And, you know, maybe it would let

00:19:59.760 --> 00:20:03.660
them know they should pay attention sooner rather than later about this thing that's coming up.

00:20:03.740 --> 00:20:08.680
Yeah. And that's actually a really good segue to like the next example that I had that was from

00:20:08.680 --> 00:20:16.320
Forbes. So they have a CMS system called Birdie. And the reason I said it's a good segue is because

00:20:16.320 --> 00:20:21.680
part of that CMS has exactly that. It has the hashtags, you know, they recommend maybe you write

00:20:21.680 --> 00:20:26.280
articles or when you put an article together, it can read through it and say like, ah, this is trending,

00:20:26.280 --> 00:20:32.200
like put this hashtag on it when you post it or tweet about it. And it looks, at least from what I've

00:20:32.200 --> 00:20:38.060
seen in the videos, it looks pretty much the same to any sort of, you know, WordPress contentful,

00:20:38.060 --> 00:20:44.720
like whatever you have there, but it just has these extra features. So when you go to add an image,

00:20:44.720 --> 00:20:51.600
it'll have suggested images that are related to the article that you have, or, you know, you'll write a

00:20:51.600 --> 00:20:58.760
headline and it'll say, ah, okay, if you switch these around, it'll be, you know, this percentage more

00:20:58.760 --> 00:21:02.460
click, click worthy. I don't know if that's actually the term I use, but.

00:21:02.460 --> 00:21:07.180
But that's the idea, right? Is that it's, it attracts, it's more shared, like more likely to

00:21:07.180 --> 00:21:11.860
be shared. People are more likely to click on it when they see it in some kind of feed scrolling by

00:21:11.860 --> 00:21:12.400
and so on.

00:21:12.400 --> 00:21:17.500
Exactly. And I mean, even topics like SEO, like accounting for like, what is click worthy,

00:21:17.620 --> 00:21:23.560
also what will get you up there. And so, or at least for certain topics.

00:21:23.560 --> 00:21:28.700
Yeah. And so all that sounds like, oh, we're trying to just take the content and make it more,

00:21:28.700 --> 00:21:35.880
um, viral, have a higher viral potential, but it also has more concrete things like reading complexity.

00:21:36.360 --> 00:21:41.320
Yes, exactly. Because I think one of the difficult parts of journalism that I think

00:21:41.320 --> 00:21:46.360
not a lot of, I don't know, maybe I'm wrong, but I feel like a lot of people misunderstand about

00:21:46.360 --> 00:21:51.660
journalism is when you're writing, especially for, you know, a newspaper that's supposed to be

00:21:51.660 --> 00:21:57.200
accessible to everyone. You need to be writing at like sixth grade level. Yeah.

00:21:57.200 --> 00:22:05.020
So sixth to eighth grade is what I was told in university. And that's difficult if, you know,

00:22:05.020 --> 00:22:10.300
writing is what you do for a living and you're used to trying to string together these beautiful

00:22:10.300 --> 00:22:16.620
prose. And so getting things really down to that plain language, short sentences, and having something

00:22:16.620 --> 00:22:21.920
to assist with that, I think, yeah, game changer, my opinion.

00:22:21.920 --> 00:22:25.940
Right. Well, absolutely. And I'm sure people could look down on it and like, well, you're trying to

00:22:25.940 --> 00:22:30.720
dumb down your article, but at the same time, if you reach more people, if the message gets across

00:22:30.720 --> 00:22:36.000
to more people, like that's the ultimate goal is to convey the information to people reading it and

00:22:36.000 --> 00:22:39.640
get more people to read it. So it seems like a noble thing to do.

00:22:39.640 --> 00:22:44.660
Absolutely. It's making information more accessible. I mean, we even have this problem in the software

00:22:44.660 --> 00:22:50.440
community with technical writing, you know, people are, it's like, oh, if I make all of my,

00:22:50.440 --> 00:22:55.860
if I make all my documentation, like really short sentences, it sounds so whatever. And you're like,

00:22:55.880 --> 00:23:00.740
but there are so many people who maybe English isn't their first language, or, you know, maybe

00:23:00.740 --> 00:23:05.840
they have a cognitive disability, or maybe they just, you know, it's a lot to read and take on.

00:23:05.840 --> 00:23:09.020
So making your language more simple helps everyone.

00:23:09.020 --> 00:23:13.000
Even if you're really good at reading, you're trying to juggle two things in your mind. You're

00:23:13.000 --> 00:23:17.760
trying to juggle the programming ideas and what the lesson is teaching you. So you're already kind of

00:23:17.760 --> 00:23:23.520
splitting your, your mental capacity. So it, it feels to me like it should go down. And I'm always

00:23:23.520 --> 00:23:29.700
amazed and sympathetic to folks who English is their second language, because all the programming

00:23:29.700 --> 00:23:34.560
keywords are in English, which it just seems like really a little unfair, but you know,

00:23:34.560 --> 00:23:39.700
such as life, I guess, but the documentation, obviously the easier, the better in that space.

00:23:39.700 --> 00:23:43.880
I also think it's a different, you know, you have to think about it in different contexts. I mean,

00:23:43.880 --> 00:23:50.480
I think journalism can be art. No one's arguing that, but I do think that the primary purpose of

00:23:50.480 --> 00:23:57.140
most journalism is to convey information. So being able to get that as succinct and clear as possible

00:23:57.140 --> 00:23:58.900
is, I would say the goal.

00:23:58.900 --> 00:24:04.740
Yeah. Well, and there's always the Mark Twain quote of, sorry, I wrote you a long letter. I didn't have

00:24:04.740 --> 00:24:05.740
time to write a short one.

00:24:05.740 --> 00:24:07.180
Exactly.

00:24:07.180 --> 00:24:10.860
Right. All right. Let's talk about earthquakes, West coast.

00:24:10.860 --> 00:24:15.000
Earthquakes. Yes. So it's another bot.

00:24:15.180 --> 00:24:18.160
Actually, I'm hoping there are no earthquakes, but let's talk about reporting on earthquakes.

00:24:18.160 --> 00:24:24.980
Reporting. Apparently there are a lot. That's what I learned from this bot. But the LA Times

00:24:24.980 --> 00:24:32.600
has also a bot. I swear, not every AI implementation in journalism is a bot, but they are, in my opinion,

00:24:32.600 --> 00:24:40.620
some of the most interesting. And it's similar to heliograph, but it specifically focuses on earthquakes.

00:24:41.580 --> 00:24:48.760
And it is exactly what we're talking about as far as the turnaround time being so quick. So for example,

00:24:48.760 --> 00:24:55.920
like back in 2014, there was this earthquake that hit the LA area and the LA Times was the first to

00:24:55.920 --> 00:25:03.020
report on it because basically the earthquake happened. The reporter woke up, just went to his computer

00:25:03.020 --> 00:25:09.740
computer and reviewed the article and published it within three minutes, like three minutes after it

00:25:09.740 --> 00:25:10.120
happened.

00:25:10.120 --> 00:25:14.660
That is incredible. A lot of people are still trying to figure out, was that an earthquake? What just

00:25:14.660 --> 00:25:16.760
happened? They're like, I published this. This is good.

00:25:16.760 --> 00:25:23.420
Exactly. And it's because it was sitting there waiting for him because he has a, like, QuakeBot is

00:25:23.420 --> 00:25:29.240
connected to the US Geological Survey. And so when an earthquake comes in above a certain,

00:25:29.240 --> 00:25:36.200
he has, he said, the programmer who did this set different parameters. So it's within this area of LA,

00:25:36.200 --> 00:25:41.900
it has this size threshold. I think there's a few other ones, but those are the main ones I remember.

00:25:41.900 --> 00:25:50.020
And then he was able to like extract the data that was sent and throw it into a pre-written template

00:25:50.020 --> 00:25:50.780
in their CMS.

00:25:51.480 --> 00:25:57.560
Anyway, last night there was an earthquake at this time, this magnitude, you know, it was for this long

00:25:57.560 --> 00:26:03.180
and yeah, just go up there and review it and hit go. That's pretty cool. There's also a gist where you

00:26:03.180 --> 00:26:07.920
can go and the person who was working on it or wrote it talked a little bit about how it works and so on,

00:26:07.920 --> 00:26:08.100
right?

00:26:08.100 --> 00:26:13.480
Yes, exactly. There's a bit, it doesn't have like the full working code, but there are a little bit of

00:26:13.480 --> 00:26:19.000
like code snippets to be able to see like what parameters were put on it. So yeah, it's very cool.

00:26:19.160 --> 00:26:23.240
Yeah. So I'll link to that in the show notes. Let's stick with LA and, you know, I was thinking

00:26:23.240 --> 00:26:27.540
earthquakes were kind of, it made people nervous. This next one definitely will make people nervous,

00:26:27.540 --> 00:26:28.720
right?

00:26:28.720 --> 00:26:37.880
Yes. I mean, it makes me nervous, but it's the reality of it. So the LA Times also has a homicide report

00:26:37.880 --> 00:26:45.400
and it takes all of the information that is given to them. I don't know exactly what the original

00:26:45.400 --> 00:26:54.520
data resources. I would assume the LAPD or something similar. And it plots all of the homicides onto a

00:26:54.520 --> 00:27:02.360
like interactive map on their website. So you can sort it and filter it by, you know, the year, it has the

00:27:02.360 --> 00:27:09.880
name, the gender, and a few other areas that I'm not exactly remembering. But any information you would want

00:27:09.880 --> 00:27:12.120
on this homicide, you can find on this map.

00:27:12.120 --> 00:27:16.360
Yeah. And it's much like the Quake bot, it just automatically receives that information

00:27:16.360 --> 00:27:22.680
and pull it up. Okay. Yeah, cool. A good service for a sad thing, I suppose. This next one is

00:27:22.680 --> 00:27:24.740
interesting, comes out of the Guardian Australia.

00:27:25.220 --> 00:27:33.140
Yes. And this is maybe my favorite of the bots, for only the reason that it is entirely open source,

00:27:33.140 --> 00:27:34.540
which I think is.

00:27:34.540 --> 00:27:35.380
Yeah, that's pretty cool.

00:27:35.380 --> 00:27:36.500
Yeah, that's very cool.

00:27:36.620 --> 00:27:44.300
So it's called ReporterMate. And it pretty much does the exact same things we were just talking

00:27:44.300 --> 00:27:52.300
about. But instead of, no, actually, it pretty much does the same things. It reports on Australian

00:27:52.300 --> 00:28:00.700
election coverage, I think a few like weather related things, stock market. So it does like very similar

00:28:00.700 --> 00:28:07.820
work, especially to Heliograph, but it is open source. And the open source tool, in and of itself

00:28:07.820 --> 00:28:14.700
is pretty cool. It uses, I this is a Python library or package that I do know pandas.

00:28:14.700 --> 00:28:16.220
Uh huh. Nice.

00:28:16.220 --> 00:28:23.100
Yes. So it uses that it uses handlebars, and a bunch of helper functions.

00:28:23.100 --> 00:28:28.200
Yeah. And it looks really cool. Like you just pip install it off you go. And yeah, it gives you all

00:28:28.200 --> 00:28:35.060
that automation, which that's kind of an interesting contrast back to like the Washington Post, doing

00:28:35.060 --> 00:28:39.940
its own sort of private thing versus here's the open source equivalent that a bunch of people can jump

00:28:39.940 --> 00:28:44.820
on. And it seems to me like this would be really helpful for smaller newsrooms, you know,

00:28:45.100 --> 00:28:50.620
small town city that doesn't really have like lots of money and maybe doesn't have much of a,

00:28:50.620 --> 00:28:54.020
they might have an IT person, but not a, like a software development team.

00:28:54.020 --> 00:28:58.780
Oh yeah. That's a big, like when we talk about this field, that's something I definitely,

00:28:58.780 --> 00:29:01.760
I don't know if you want to talk about now or we can talk about it later, but that's.

00:29:01.760 --> 00:29:06.660
Let's talk about a little bit. So there's this, I suppose there's this really big challenge between

00:29:06.660 --> 00:29:11.720
like take the top 10 newspaper organizations and they have paywalls, they have tech teams,

00:29:11.760 --> 00:29:18.700
they have mobile apps. They're like, they are a tech company in some aspect versus a news organization

00:29:18.700 --> 00:29:22.860
for a town that's got a hundred thousand people like Lawrence, Kansas, where I went to college,

00:29:22.860 --> 00:29:26.320
right? Like they, maybe that's a bad example. That's actually where Django came from, but,

00:29:26.320 --> 00:29:31.740
but in general, like these smaller newspapers don't necessarily have big tech teams.

00:29:31.740 --> 00:29:38.220
Yeah, no, for sure. They, so what a lot of people talk about is that AI and journalism is already a

00:29:38.220 --> 00:29:45.020
significant part, but it's really unevenly distributed. So it's focused on these big news

00:29:45.020 --> 00:29:52.980
organizations that either have their own development team or have a relationship to these big tech

00:29:52.980 --> 00:30:00.520
companies. And I find the topic really interesting because it dives, it gets a little bit into the

00:30:00.520 --> 00:30:05.840
ethics field because what a lot of people, at least according to the survey that I mentioned earlier,

00:30:05.840 --> 00:30:11.160
what a lot of people are afraid of. And I didn't even think about until reading this survey, which is

00:30:11.160 --> 00:30:17.740
unfortunate is that people are afraid of like these big tech companies having more power,

00:30:17.980 --> 00:30:22.720
like not only fueling the power of these big tech companies like Google, for example,

00:30:22.720 --> 00:30:29.920
but also how that would potentially impact the reporting of those big tech companies. You know,

00:30:29.920 --> 00:30:31.560
it's kind of this cycle.

00:30:31.560 --> 00:30:35.780
Journalism is a check on the power of those tech companies, right? And if-

00:30:35.780 --> 00:30:36.080
Exactly.

00:30:36.080 --> 00:30:41.780
Yeah. Do you really want to write a negative article about them? Like what's going to happen to your

00:30:41.780 --> 00:30:45.940
other articles if you become too much of a negative force on them, right?

00:30:46.200 --> 00:30:52.300
Exactly. And I mean, they hold a lot of power in this situation because I mean, a lot of these

00:30:52.300 --> 00:30:57.760
news organizations, specifically the ones that don't have their own development team or have a very small

00:30:57.760 --> 00:31:04.600
development team, really rely on that technology to keep up and kind of keep that pace in this

00:31:04.600 --> 00:31:12.200
online, like very much online, very much 24, 24 hour news cycle. So it's scary when you think about it like

00:31:12.200 --> 00:31:18.600
that it really is a little bit. Talk Python To Me is partially supported by our training courses.

00:31:18.600 --> 00:31:24.640
How does your team keep their Python skills sharp? How do you make sure new hires get started fast and

00:31:24.640 --> 00:31:31.060
learn the Pythonic way? If the answer is a series of boring videos that don't inspire or a subscription

00:31:31.060 --> 00:31:36.740
service you pay way too much for and use way too little, listen up. At Talk Python Training, we have

00:31:36.740 --> 00:31:42.120
enterprise tiers for all of our courses. Get just the one course you need for your team with full reporting,

00:31:42.120 --> 00:31:47.800
and monitoring or ditch that unused subscription for our course bundles, which include all the courses

00:31:47.800 --> 00:31:54.460
and you pay about the same price as a subscription once. For details, visit training.talkpython.fm

00:31:54.460 --> 00:31:58.200
slash business or just email sales at talkpython.fm.

00:32:00.440 --> 00:32:04.500
The first place that comes to mind when I think about like those challenges, it's got to be Facebook.

00:32:04.500 --> 00:32:07.500
But I want to ask you a question, not about Facebook, about something else.

00:32:07.500 --> 00:32:08.500
Google News.

00:32:08.500 --> 00:32:15.720
So you're in Europe right now and Europe has had a mixed, interesting relationship with Google News.

00:32:15.720 --> 00:32:25.460
I feel like, you know, I think Spain had tried to like prohibit or charge Google for like putting the headlines from Spanish newspapers.

00:32:25.460 --> 00:32:26.620
I think it was Spain.

00:32:26.620 --> 00:32:28.120
It was somewhere in Europe.

00:32:28.400 --> 00:32:33.040
And they tried to limit how Google News could sort of use their free like headlines.

00:32:33.040 --> 00:32:34.680
And so they just stopped it.

00:32:34.680 --> 00:32:36.200
And then they're like, wait, wait, wait.

00:32:36.200 --> 00:32:37.080
Where'd all our traffic go?

00:32:37.080 --> 00:32:37.900
Bring back Google News.

00:32:37.900 --> 00:32:38.820
We need Google News again.

00:32:38.820 --> 00:32:39.600
What is going on here?

00:32:39.600 --> 00:32:44.300
You know, I feel like there's something like that happening right now in Australia as well.

00:32:44.300 --> 00:32:47.800
So there always seems to be like this tension of like, oh, we hate them.

00:32:47.800 --> 00:32:48.840
They're like robbing from us.

00:32:48.840 --> 00:32:49.980
Wait, we need them.

00:32:49.980 --> 00:32:50.760
They're our savior.

00:32:50.760 --> 00:32:51.520
Bring them back.

00:32:51.520 --> 00:32:54.700
You know, what's your thought from being more on the inside of that world?

00:32:54.700 --> 00:32:55.460
I'll be honest.

00:32:55.460 --> 00:33:00.560
I haven't been in as many discussions that discuss like Google News specifically.

00:33:00.560 --> 00:33:05.880
But something I do, I have talked about with my friends and I actually saw a Twitter thread about it today.

00:33:05.880 --> 00:33:06.960
I'll send you the link.

00:33:06.960 --> 00:33:07.540
Awesome.

00:33:07.660 --> 00:33:25.060
And what's about how, I mean, this thread didn't mention Google specifically, but come on, has to do with it, is that like how much American media like seeps into European news coverage and what people are aware of.

00:33:25.060 --> 00:33:39.500
And I could imagine also if you're seeking your news online or through something like Google, naturally then a lot of American politics comes up, a lot of American systems come up where you're kind of fed this idea.

00:33:39.500 --> 00:33:42.700
And in a way that is so unique to the U.S.

00:33:42.700 --> 00:33:46.100
Because at least like growing, I grew up in the U.S.

00:33:46.100 --> 00:33:52.860
So I don't remember ever being like, wow, why do we have so much news on any other country?

00:33:52.860 --> 00:33:55.160
I remember if you wanted news on other.

00:33:55.160 --> 00:33:56.740
Why does Brazil always give us their news?

00:33:56.740 --> 00:33:58.120
Like, I just don't need Brazil's news.

00:33:58.120 --> 00:33:58.860
Like, no offense to Brazil.

00:33:58.860 --> 00:34:01.860
I'm just grabbing like a random country that doesn't do that generally.

00:34:01.860 --> 00:34:02.640
Exactly.

00:34:02.640 --> 00:34:03.120
Yeah.

00:34:03.240 --> 00:34:11.780
I remember like if you wanted news, like, for example, my sister studied in Japan and before she left, she wanted, you know, to try to keep up with Japanese news.

00:34:11.780 --> 00:34:17.780
She had to literally go buy the Japanese paper from the actual Japanese store that we had.

00:34:17.780 --> 00:34:29.020
So you really have to seek out that news versus like a lot of American news is just kind of, what's the word, like filtered into the everyday experience of people in Europe.

00:34:29.020 --> 00:34:31.200
And it's kind of like, why?

00:34:31.200 --> 00:34:43.520
And I mean, even, I don't know this personally, but I know there's some even tension between like Western Europe and Eastern Europe as far as like how their news is represented, who, how much do you see of each country?

00:34:43.520 --> 00:34:44.680
How much do you really know?

00:34:44.680 --> 00:34:45.060
Right.

00:34:45.060 --> 00:34:50.200
And you're in Berlin, which is like right on that, that line there in historically speaking.

00:34:50.200 --> 00:34:53.120
But it's definitely still in like the more Western category.

00:34:53.120 --> 00:34:59.600
I would say we hear a lot more about, you know, France or the UK than we do about Bulgaria.

00:34:59.600 --> 00:35:00.480
Yeah.

00:35:00.480 --> 00:35:01.620
Very interesting.

00:35:01.620 --> 00:35:02.580
Just an example.

00:35:02.580 --> 00:35:07.720
One of my favorite songs is Californication from the Red Hot Chili Peppers.

00:35:07.720 --> 00:35:12.240
And that song's all about like how America is exporting their culture and stuff.

00:35:12.240 --> 00:35:16.760
I think generally through music and Hollywood, it's only more so now, right?

00:35:16.760 --> 00:35:19.360
With like the tech companies and online and so on.

00:35:19.760 --> 00:35:20.240
Interesting.

00:35:20.240 --> 00:35:20.820
I know.

00:35:20.820 --> 00:35:25.320
I think there's this tension between both the big tech companies just controlling.

00:35:25.320 --> 00:35:28.060
They're the aggregators of the attention.

00:35:28.060 --> 00:35:31.540
So they're controlling access to what gets attention.

00:35:31.540 --> 00:35:36.040
And then there's this tension between the big newspapers and the small newspapers, right?

00:35:36.040 --> 00:35:39.580
Because the big newspapers have software teams that can just go, yeah, yeah, PyTorch.

00:35:39.580 --> 00:35:40.180
Let's use that.

00:35:40.180 --> 00:35:42.400
And other people are like, what?

00:35:42.400 --> 00:35:42.960
Is there a fire?

00:35:42.960 --> 00:35:43.860
What is this torch about?

00:35:43.860 --> 00:35:44.180
Yeah.

00:35:44.180 --> 00:35:44.620
Yeah.

00:35:44.620 --> 00:35:46.340
So, yeah.

00:35:46.340 --> 00:35:47.020
Very interesting.

00:35:47.020 --> 00:35:48.420
Let's keep going on some of these things.

00:35:48.420 --> 00:36:02.960
The next one that I thought was interesting that you brought up was what ProPublica was doing around analyzing not what people in the U.S. Congress say they're interested in, but what their actions and behaviors and words say they're interested in.

00:36:02.960 --> 00:36:03.460
Yeah.

00:36:03.680 --> 00:36:10.820
So ProPublica took an analysis of thousands of press releases over the course of two years.

00:36:10.820 --> 00:36:17.280
And they trained a computer model to extract which phrases each Congress member uses most frequently.

00:36:17.280 --> 00:36:26.820
And then under the assumption that if these are the phrases they're using most frequently, these are likely the topics that they are pushing for and care about the most.

00:36:27.240 --> 00:36:32.980
Because, or else, why would you be releasing all this press release about it if it's not a topic?

00:36:32.980 --> 00:36:36.420
And some of them, a lot, like some of them were in line with them.

00:36:36.420 --> 00:36:40.060
I don't have the article up right now, but I would suggest checking it out.

00:36:40.060 --> 00:36:45.120
But some of it, the Congress people were really in line with what their beliefs were.

00:36:45.120 --> 00:36:47.660
And a lot of them, it was like, we say this.

00:36:47.660 --> 00:36:47.980
Yes.

00:36:47.980 --> 00:36:49.100
But we don't really agree.

00:36:49.100 --> 00:36:50.280
Yes, exactly.

00:36:50.280 --> 00:36:50.500
Yeah.

00:36:51.020 --> 00:36:53.320
This is what we say we're for and this is what we're actually for.

00:36:53.320 --> 00:36:54.260
How interesting.

00:36:54.260 --> 00:37:02.640
One of the things I was thinking of that's like a cool automate the boring stuff when you're interested in those kinds of things is like speech to text.

00:37:02.640 --> 00:37:11.800
A lot of times you'll be some kind of presentation, like there's a video of the person, but it would be much better if you could just index the keywords of what they said.

00:37:11.800 --> 00:37:18.440
And, you know, the ability to just like take spoken word and video and turn it into written stuff that you can analyze.

00:37:18.440 --> 00:37:20.560
It seems like that'd be pretty interesting in journalism.

00:37:20.920 --> 00:37:21.220
Yes.

00:37:21.220 --> 00:37:25.460
I can't believe I forgot to mention that because I think that's the most tedious part of reporting.

00:37:25.460 --> 00:37:35.980
I remember having to either, you know, have my little hand recorder and record things and then type it out myself later or just take handwritten notes and be like, I hope I got this quote right.

00:37:35.980 --> 00:37:41.600
So I think the idea that you can, I mean, there are apps like right now I use Otter.

00:37:41.600 --> 00:37:42.100
Yeah.

00:37:42.100 --> 00:37:43.020
I like an instant.

00:37:43.020 --> 00:37:43.600
Yeah.

00:37:43.600 --> 00:37:44.020
Yeah.

00:37:44.020 --> 00:37:49.220
I have not had any practical use for Otter, but I've tried to use it for the podcast and live.

00:37:49.340 --> 00:37:56.000
It's like, it'll transcribe multi-person conversations and attribute the spoken word to the different people and so on.

00:37:56.000 --> 00:37:56.180
Right?

00:37:56.180 --> 00:37:57.020
Yes, exactly.

00:37:57.020 --> 00:38:05.380
You still need to definitely read through it because especially if you're, I can imagine, especially in software, we're using a lot of lingo that isn't common speak.

00:38:05.820 --> 00:38:35.800
You have to go through it.

00:38:35.800 --> 00:38:51.200
version of word will now let you like, I heard this just yesterday or the day before you can take an MP3 and upload it to your document and then just grab paragraphs of transcoded text and just drop them into your document, like right out of the MP3 file, which is, that's pretty awesome.

00:38:51.200 --> 00:38:52.100
That'll help a lot of people.

00:38:52.100 --> 00:38:52.980
Yeah, that's great.

00:38:53.020 --> 00:39:04.920
And it also goes along with what we were saying about readability where it's again, an accessibility issue where, you know, a lot of before it was always like, ah, it's so hard to have a video have captions or like, oh, it's really hard.

00:39:04.920 --> 00:39:11.240
If we recorded this interview to have, you know, a written version because it takes so much power or work time.

00:39:11.360 --> 00:39:18.900
And now it's like, there's almost no excuse to make things inaccessible because the resources are there and a lot of them are free and available.

00:39:18.900 --> 00:39:19.620
Yeah.

00:39:19.620 --> 00:39:20.180
Yeah.

00:39:20.180 --> 00:39:21.360
It's super cool.

00:39:21.360 --> 00:39:24.820
So the next one that you talked about was BuzzFeed.

00:39:24.880 --> 00:39:28.780
And to me, BuzzFeed is like listicle type stuff.

00:39:28.780 --> 00:39:34.120
And it's when you think of viral headlines, like they probably got some things that like really recommend headlines.

00:39:34.120 --> 00:39:39.380
That said, this next thing, they do do real news reporting as well in some interesting ways.

00:39:39.380 --> 00:39:42.700
And this next one actually is pretty interesting there, right?

00:39:42.700 --> 00:39:43.060
Yeah.

00:39:43.060 --> 00:39:43.760
BuzzFeed.

00:39:43.760 --> 00:39:50.680
I love a good BuzzFeed quiz, but they also just, they have real BuzzFeed news, really good journalism.

00:39:51.840 --> 00:40:04.480
And this one is, I remember reading about it and I couldn't believe it because it just sounds like sci-fi movie, but they trained a computer model to find and track like secret airplanes.

00:40:04.480 --> 00:40:15.640
So what I mean by that is the computer used a machine algorithm sift for planes with flight patterns that resembled those of the FBI or the Department of Homeland Security.

00:40:15.640 --> 00:40:21.400
Like the plane that goes up and just flies in circles around a city rather than from a city to a city, right?

00:40:21.480 --> 00:40:22.080
Something like that.

00:40:22.080 --> 00:40:22.600
Exactly.

00:40:22.600 --> 00:40:23.200
Yeah.

00:40:23.200 --> 00:40:24.020
I don't know.

00:40:24.020 --> 00:40:24.240
Yeah.

00:40:24.240 --> 00:40:26.980
I could totally imagine it's exactly like that.

00:40:26.980 --> 00:40:30.080
Something a little bit strange in that way.

00:40:30.080 --> 00:40:36.400
But it allowed them to report on a ton of different topics that, again, I just think are wild.

00:40:36.400 --> 00:40:44.220
So like, for example, like how U.S. marshals hunted down drug cartel kingpins in Mexico.

00:40:44.220 --> 00:40:45.520
Like how?

00:40:45.820 --> 00:40:46.640
I don't know.

00:40:46.640 --> 00:40:46.660
I don't know.

00:40:46.660 --> 00:40:53.880
How there was like a military contractor that tracks terrorists in Africa, but I guess they were flying over U.S. cities.

00:40:53.880 --> 00:40:54.480
Right.

00:40:54.480 --> 00:41:01.540
Wait, if your job is to track, you know, military stuff in some foreign country, what are you doing in Dallas?

00:41:01.540 --> 00:41:02.420
Flying around.

00:41:02.420 --> 00:41:03.000
Yeah, exactly.

00:41:03.000 --> 00:41:03.920
That's suspicious.

00:41:03.920 --> 00:41:15.300
And just other topics around like aerial surveillance, which is, again, when I read it, I was like, you know, I think of when I think of aerial surveillance, I think of like deep conspiracy theories.

00:41:15.300 --> 00:41:16.600
Like what are they called?

00:41:16.600 --> 00:41:17.580
Like jet trails?

00:41:17.580 --> 00:41:18.040
Yeah.

00:41:18.040 --> 00:41:18.880
Con trails.

00:41:18.880 --> 00:41:19.440
Yeah.

00:41:19.440 --> 00:41:20.400
Con trails.

00:41:20.400 --> 00:41:23.820
So that's always what I thought of.

00:41:23.820 --> 00:41:28.980
So then when I saw this report, I was like, oh my God, like they did it.

00:41:28.980 --> 00:41:29.480
Yeah.

00:41:29.480 --> 00:41:30.080
Super cool.

00:41:30.080 --> 00:41:32.320
This is really interesting and well done there.

00:41:32.320 --> 00:41:50.480
So I guess the last one I want to talk about is probably the most far out one, which is what if we could have a drone fly a robot in to walk around dangerous places like war zones and investigate human like a humanoid robot, but not a person.

00:41:50.480 --> 00:41:52.120
Yeah, that's from Al Jazeera.

00:41:52.120 --> 00:41:57.320
And they mentioned it at one of their future media leaders summit in 2018.

00:41:57.960 --> 00:42:03.100
It's still from what I can tell, it's still in development and a bit far out.

00:42:03.100 --> 00:42:04.700
It is far out.

00:42:04.700 --> 00:42:06.000
But I think it's really interesting.

00:42:06.000 --> 00:42:16.220
Like the idea that, you know, we're so used to or at least if you look up like drones and war zones, that's a common practice is to have some sort of drone that goes in.

00:42:16.220 --> 00:42:19.480
What it does, you know, depends on who's using it.

00:42:19.480 --> 00:42:23.560
But the idea with this is that it would deploy a robot.

00:42:23.560 --> 00:42:23.880
Yeah.

00:42:23.880 --> 00:42:25.580
That can take video.

00:42:25.580 --> 00:42:26.960
It can surveil what's going on.

00:42:26.960 --> 00:42:29.940
It can, you know, record the sounds that are happening.

00:42:29.940 --> 00:42:36.220
They want it to be able to dodge sniper attacks and assess the situation.

00:42:36.220 --> 00:42:46.400
And what I think is really interesting about that is that, I mean, there is like a lot of human journalists, one, aren't trained for that type of environment.

00:42:46.400 --> 00:42:49.520
And it would be very difficult to train someone in that.

00:42:49.520 --> 00:42:52.940
There are journalists embedded within military.

00:42:52.940 --> 00:42:55.500
So like the U.S. military has journalists.

00:42:55.500 --> 00:43:04.600
But again, it goes into the issue of how much of it is true or how much of it is influenced by their employer.

00:43:05.200 --> 00:43:09.860
You're certainly getting one perspective if you're like with those folks.

00:43:09.860 --> 00:43:18.460
I know folks try to be objective, but they're there for your safety and you can only go and be with them where they are.

00:43:18.460 --> 00:43:19.100
Right.

00:43:19.220 --> 00:43:25.600
So even like no matter whether right or wrong, like you're getting a somewhat influenced perspective from that.

00:43:25.600 --> 00:43:25.840
Right.

00:43:25.840 --> 00:43:26.300
Exactly.

00:43:26.300 --> 00:43:30.340
You can't just walk around like, well, let me go talk to those guys over there and see what they think.

00:43:30.340 --> 00:43:31.140
No, they're shooting me.

00:43:31.140 --> 00:43:31.980
I'm going to not do that.

00:43:33.160 --> 00:43:33.560
Exactly.

00:43:33.560 --> 00:43:33.640
Exactly.

00:43:33.640 --> 00:43:41.740
And I mean, there's also the ethics of sending, you know, civilian human journalists into these kind of hostile spaces and being like, here, report on it.

00:43:41.740 --> 00:43:46.920
I mean, there are crisis reporters and you hear about journalists being captured in places like Yemen.

00:43:46.920 --> 00:43:54.300
But again, it's like an ethics debate of when should that be required and when should it not.

00:43:54.860 --> 00:43:55.820
Yeah, for sure.

00:43:55.820 --> 00:44:05.280
So, yeah, this is a really interesting idea of taking legitimately taking a humanoid drone or robot and having it walk around in war zones.

00:44:05.280 --> 00:44:10.220
So they actually have a YouTube video or a video on YouTube that you can see a little animation of when it works.

00:44:10.220 --> 00:44:12.240
I guess a couple other things we could talk about.

00:44:12.240 --> 00:44:16.160
So this is a few tools that we talked about are open source and people can use.

00:44:16.160 --> 00:44:20.940
But a lot of this is kind of like how news organizations are using this technology on their platforms.

00:44:20.940 --> 00:44:28.180
But there's also some tools that people can use, like Quartz AI Studio and Google News Initiative and stuff like that.

00:44:28.180 --> 00:44:30.840
You want to give us a quick rundown of some things that people can use?

00:44:30.840 --> 00:44:31.520
Yeah, definitely.

00:44:31.520 --> 00:44:33.960
So as you mentioned, there's Quartz AI Studio.

00:44:33.960 --> 00:44:39.640
It's from the Knight Foundation, which is an esteemed organization in journalism.

00:44:39.640 --> 00:44:48.240
And they help journalists, like trained journalists to use machine learning in their reporting and also can provide support and tools.

00:44:48.620 --> 00:44:56.240
And I think that's great because it makes these practices more accessible to these smaller news organizations or even freelance journalists.

00:44:56.240 --> 00:45:00.460
I don't know the exact requirements for what it takes to get their support.

00:45:00.460 --> 00:45:07.280
But again, the fact that they're even offering this access to these smaller organizations, I think, is great.

00:45:08.140 --> 00:45:15.560
And Google, we've mentioned a few times, they do a lot of research in regards to I mean, they're the ones from the survey I keep quoting.

00:45:15.560 --> 00:45:18.100
They do a lot of interesting research on this topic.

00:45:18.100 --> 00:45:23.020
So, for example, they have facets, which is a machine learning data visualization tool.

00:45:23.020 --> 00:45:24.760
And it's open source.

00:45:24.760 --> 00:45:29.780
So you can play with data within it and create visualizations of the information.

00:45:30.320 --> 00:45:38.820
And then finally, there's like Google News Initiative and more specifically journalism AI and Google News Initiative.

00:45:38.820 --> 00:45:40.100
It goes back.

00:45:40.100 --> 00:45:42.640
Everything is basically a circle.

00:45:42.920 --> 00:45:55.580
And it goes back to what we were talking about earlier in regards to these big tech companies and then who provides access to these tools, but also who teaches journalists how to use these tools.

00:45:56.200 --> 00:46:06.160
So the biggest resource that I know is for training journalists for what is machine learning, how to use it in reporting, what are the different tools available?

00:46:06.160 --> 00:46:07.120
Here's an introduction.

00:46:07.120 --> 00:46:09.100
Is Google News Initiative.

00:46:09.100 --> 00:46:12.720
They have about like 40 courses that are available to journalists.

00:46:12.720 --> 00:46:18.760
But again, it's kind of, of course, it leans towards, hey, use our Google products to do these things.

00:46:18.760 --> 00:46:34.840
And it also leads to a, I'm trying to figure out a way to word this, not like a one-way information direction, but like kind of a funnel of like, okay, this is your only information about how to, you know, think about data, how to analyze algorithms.

00:46:34.840 --> 00:46:39.560
But I'm a big fan of like tech is pretty much always biased.

00:46:39.560 --> 00:46:40.480
It's always political.

00:46:40.480 --> 00:46:45.180
So that influence from Google has to be apparent in there somewhere.

00:46:45.180 --> 00:46:45.620
Right.

00:46:45.620 --> 00:46:49.340
And it might not necessarily be overtly intentional, right?

00:46:49.340 --> 00:46:53.820
It could just be the people who built it all have, they generally share one way of you in the world.

00:46:53.820 --> 00:46:56.240
And so that's probably going to show up in there somehow.

00:46:56.240 --> 00:46:57.160
A hundred percent.

00:46:57.160 --> 00:47:08.140
Like algorithm bias, especially, is something that also came up in this survey about something that people are afraid of and something that people are nervous about implementing.

00:47:08.140 --> 00:47:22.960
People being like reporters are nervous about implementing in their work because, you know, if you don't know how to analyze an algorithm and know where it's getting its data and knowing where, how that data is being prioritized, then it's difficult to know.

00:47:22.960 --> 00:47:29.180
Like, am I presenting data that is really reputable and as unbiased as it can be?

00:47:29.180 --> 00:47:31.340
Or if it is biased, what are the biases?

00:47:31.340 --> 00:47:32.000
Right.

00:47:32.000 --> 00:47:37.300
And so they don't want to be publishing like blatantly biased reporting unless they do.

00:47:37.300 --> 00:47:40.300
But I think a lot of people don't ever intend to have that happen.

00:47:40.300 --> 00:47:40.580
Right.

00:47:40.580 --> 00:47:42.040
There's different levels, right?

00:47:42.040 --> 00:47:54.040
Like it could be that you're using an algorithm and it's giving you information and then you're writing something based on that influenced or directed or biased information.

00:47:54.040 --> 00:47:57.200
Or it could be something as simple as like the bot that tells me what's trending.

00:47:57.200 --> 00:48:04.780
It's always interested more in this other part of society than maybe what actually is more important to the most part of society.

00:48:04.780 --> 00:48:09.940
Like it could be it really cares about people in New York and their financial behaviors.

00:48:09.940 --> 00:48:14.320
Or it could be it cares about the plight of middle of the challenges of middle of America.

00:48:14.320 --> 00:48:16.460
Or it could be racially biased.

00:48:16.460 --> 00:48:18.160
There's all sorts of things that it could be.

00:48:18.160 --> 00:48:20.980
And it's not like the algorithm is so incredibly biased.

00:48:20.980 --> 00:48:26.000
It's just like says, hey, you should pay attention to this aspect of life rather than that.

00:48:26.000 --> 00:48:26.220
Right.

00:48:26.220 --> 00:48:28.980
Like that could be really subtle, I think, and challenging.

00:48:28.980 --> 00:48:29.660
Definitely.

00:48:29.660 --> 00:48:36.680
I mean, like for a lot of the information where I was talking about where people get these big data sets and sift through it.

00:48:36.680 --> 00:48:41.320
I mean, that data set that you get could be biased if you don't know how it was collected.

00:48:41.320 --> 00:48:46.320
And for a lot of people, getting a big data set like that is really exciting.

00:48:46.320 --> 00:48:50.080
It's like, oh, OK, like I don't have to go through the work of surveying thousands of people.

00:48:50.080 --> 00:48:54.900
And so it's really appealing, I think, as a reporter to want to act on that.

00:48:54.900 --> 00:48:55.220
Yeah.

00:48:55.220 --> 00:49:00.240
And yeah, there's also just so we don't go down a total rabbit hole.

00:49:00.240 --> 00:49:09.060
There's also Mozilla, although in recent news of Mozilla, I'm not sure how much of this is still in effect.

00:49:09.060 --> 00:49:13.540
But they have a history of partnering with journalists and news organizations.

00:49:13.540 --> 00:49:21.680
So there's a organization called Open News, and it's a network of developers, journalists, designers, editors.

00:49:21.680 --> 00:49:25.820
And they collaborate on open technologies and processes within journalism.

00:49:25.820 --> 00:49:29.380
And that is its own organization now.

00:49:29.380 --> 00:49:31.600
But it was originally incubated within Mozilla.

00:49:31.600 --> 00:49:37.280
And they also have other other ones like the Mozilla Information Trust Initiative,

00:49:37.500 --> 00:49:44.720
which is a collection of comprehensive efforts to keep the Internet credible and healthy and fight misinformation.

00:49:45.320 --> 00:49:51.860
And also, they just announced like a few days ago.

00:49:51.860 --> 00:49:55.180
I'm fact checking myself as we speak.

00:49:55.640 --> 00:50:05.580
The Mozilla Foundation announced that there was going to be a new fund for black artists that examines the relationship between AI and racial justice.

00:50:05.580 --> 00:50:06.060
Okay.

00:50:06.060 --> 00:50:06.860
Yeah, very cool.

00:50:06.860 --> 00:50:15.020
So not maybe that isn't directly related to reporting, but I think a lot of those outcomes can directly influence reporting,

00:50:15.160 --> 00:50:24.200
especially around news coverage, around topics like Black Lives Matter and racial injustice and the justice system in general in the U.S.

00:50:24.200 --> 00:50:25.340
Yeah, for sure.

00:50:25.340 --> 00:50:30.840
Yeah, I think Mozilla is definitely a pretty positive force for these things compared to a lot of the tech companies.

00:50:30.840 --> 00:50:31.240
That's great.

00:50:31.240 --> 00:50:32.000
All right.

00:50:32.000 --> 00:50:35.000
Well, there are so many more things I would like to ask you.

00:50:35.000 --> 00:50:38.540
But I have on my list to talk to you about.

00:50:38.540 --> 00:50:41.360
But at the same time, we're running short on time.

00:50:41.860 --> 00:50:47.900
So let me ask you just one question about kind of what you're up to these days, being a full circle.

00:50:47.900 --> 00:50:53.600
I know that you're looking for maybe your next project, your next thing to be working on and doing.

00:50:53.600 --> 00:50:57.440
Do you want to tell people like what you're interested in, if they've got an opportunity for you out there?

00:50:57.440 --> 00:51:00.660
Yeah, I'm looking for a new role.

00:51:00.660 --> 00:51:04.560
So this Google Season of Docs program runs until December.

00:51:04.560 --> 00:51:08.340
And then after that, I'm hoping to start something new.

00:51:08.600 --> 00:51:12.900
I'm a front-end developer by trade, been a front-end developer for about two years.

00:51:12.900 --> 00:51:16.380
So I've been mostly looking at roles in that.

00:51:16.380 --> 00:51:24.780
But of course, I would love to get back into journalism and tech reporting or even go in from the engineering side.

00:51:24.780 --> 00:51:30.700
So if anyone knows anything, my inbox is fully open for you.

00:51:30.700 --> 00:51:31.400
Awesome.

00:51:31.400 --> 00:51:35.940
And I'll be sure to put your contact information in the show notes so people can get in touch with you.

00:51:35.940 --> 00:51:36.460
Great.

00:51:36.780 --> 00:51:37.520
Yeah, very cool.

00:51:37.520 --> 00:51:37.760
All right.

00:51:37.760 --> 00:51:40.780
Now, before we get out of here, I'm going to ask you the final two questions.

00:51:40.780 --> 00:51:44.340
If you're going to write some code, what code editor do you use?

00:51:44.340 --> 00:51:46.840
I use VS Code because I do a lot of TypeScript.

00:51:46.840 --> 00:51:50.620
So it has the best TypeScript support in my opinion.

00:51:50.620 --> 00:51:51.580
Yeah, cool, cool.

00:51:51.580 --> 00:51:55.380
And, you know, it's written in TypeScript, so it better have good TypeScript support.

00:51:55.380 --> 00:51:56.380
Awesome.

00:51:56.520 --> 00:51:59.660
You basically just follow the squiggly line until you find the error.

00:51:59.660 --> 00:52:00.600
Yeah, perfect.

00:52:00.600 --> 00:52:06.440
And then I always like to bring up like some interesting Python library package for folks out there.

00:52:06.440 --> 00:52:07.900
So what do you got for us this week?

00:52:07.900 --> 00:52:09.040
Got one that's interesting to you?

00:52:09.400 --> 00:52:09.880
Yes.

00:52:09.880 --> 00:52:14.360
There's a package that I was very, very recently introduced to.

00:52:14.360 --> 00:52:16.100
Newspaper 3K.

00:52:16.100 --> 00:52:19.620
And it's like an article scraping and curation package.

00:52:20.560 --> 00:52:29.240
So you can take a URL from an article that is somewhere on the interweb and it will scrape

00:52:29.240 --> 00:52:31.260
it and try to find information from it.

00:52:31.260 --> 00:52:37.080
Like, for example, like the author, the publishing date, some of the text, top images, et cetera.

00:52:37.080 --> 00:52:37.800
Oh, my gosh.

00:52:37.800 --> 00:52:39.420
This is, yeah, this thing is super cool.

00:52:39.420 --> 00:52:43.900
I've heard of this before, but I think it's the perfect fit for what we're talking about

00:52:43.900 --> 00:52:44.240
today.

00:52:44.240 --> 00:52:49.860
Its features include multi-threaded article download framework, news URL identification.

00:52:50.500 --> 00:52:55.220
And I think it'll even do things like you point it at like a landing page, like the homepage

00:52:55.220 --> 00:52:57.940
of a newspaper, and it'll find all the sub articles and stuff.

00:52:57.940 --> 00:52:58.600
Yeah, super cool.

00:52:58.600 --> 00:53:02.660
So if you're into researching news, you want to do web scraping, you might not have to start

00:53:02.660 --> 00:53:04.920
from like low level programming with beautiful soup.

00:53:04.920 --> 00:53:07.400
You could just get more of the direct data here.

00:53:07.400 --> 00:53:08.960
Yeah, great one.

00:53:08.960 --> 00:53:09.720
All right.

00:53:09.720 --> 00:53:10.580
Final call to action.

00:53:10.580 --> 00:53:15.760
You know, speaking to the folks who work somehow with the journalism industry, they want to get

00:53:15.760 --> 00:53:18.060
code and technology more into what they're doing.

00:53:18.060 --> 00:53:18.840
What would you tell them?

00:53:18.840 --> 00:53:22.760
I would tell them that it's really a great opportunity to look into.

00:53:22.760 --> 00:53:29.000
It's something that I really believe is the future of the industry is the future of information

00:53:29.000 --> 00:53:29.840
and reporting.

00:53:30.240 --> 00:53:34.460
But I think I would definitely approach it with caution.

00:53:34.660 --> 00:53:47.660
So make sure that if you're if you are someone who is either going to be building these algorithms or you're going to be using them to make sure that you're asking the right questions about where the data comes from, where is it being prioritized?

00:53:47.660 --> 00:54:11.940
And in general, beyond people who are going to be using them to ask questions, be skeptical, and just be aware that the story that you're reading might be generated by a bot or an algorithm.

00:54:12.300 --> 00:54:13.700
Yeah, good advice.

00:54:13.700 --> 00:54:15.880
Well, Carolyn, thank you so much for being on the show.

00:54:15.880 --> 00:54:20.960
It's a fascinating look inside into the journalism industry and tech intersection.

00:54:20.960 --> 00:54:21.400
Yeah.

00:54:21.400 --> 00:54:22.720
Thank you so much for having me.

00:54:22.720 --> 00:54:24.040
I love this topic.

00:54:24.040 --> 00:54:26.100
So yeah, it's very interesting.

00:54:26.100 --> 00:54:26.540
You bet.

00:54:26.540 --> 00:54:26.840
Bye bye.

00:54:27.680 --> 00:54:30.460
This has been another episode of Talk Python To Me.

00:54:30.460 --> 00:54:36.200
Our guest in this episode was Carolyn Stransky, and it's been brought to you by Brilliant.org and Talk Python Training.

00:54:36.200 --> 00:54:40.720
Brilliant.org encourages you to level up your analytical skills and knowledge.

00:54:40.720 --> 00:54:46.760
Visit talkpython.fm/brilliant and get Brilliant Premium to learn something new every day.

00:54:46.760 --> 00:54:49.100
Want to level up your Python?

00:54:49.100 --> 00:54:53.960
If you're just getting started, try my Python Jumpstart by Building 10 Apps course.

00:54:53.960 --> 00:55:02.120
Or if you're looking for something more advanced, check out our new async course that digs into all the different types of async programming you can do in Python.

00:55:02.120 --> 00:55:06.780
And of course, if you're interested in more than one of these, be sure to check out our Everything Bundle.

00:55:06.780 --> 00:55:08.660
It's like a subscription that never expires.

00:55:08.660 --> 00:55:10.820
Be sure to subscribe to the show.

00:55:10.820 --> 00:55:13.320
Open your favorite podcatcher and search for Python.

00:55:13.320 --> 00:55:14.460
We should be right at the top.

00:55:14.460 --> 00:55:25.520
This is your host, Michael Kennedy.

00:55:25.520 --> 00:55:27.020
Thanks so much for listening.

00:55:27.020 --> 00:55:28.080
I really appreciate it.

00:55:28.080 --> 00:55:29.840
Now get out there and write some Python code.

00:55:43.840 --> 00:55:50.560
I really appreciate it.

