Learn Python with Talk Python's 270 hours of courses

#280: Python and AI in Journalism Transcript

Recorded on Wednesday, Aug 26, 2020.

00:00 If there's ever been a time in history that journalism is needed to shine a light on what's

00:04 happening in the world, it's now. Would it surprise you to hear that Python and machine learning are

00:09 playing an increasingly important role in discovering and bringing us the news? On this

00:14 episode, you'll meet Carolyn Stransky, a journalist and developer who's been researching this

00:18 intersection of tech and journalism. This is Talk Python to Me, episode 280, recorded August 26,

00:25 2020.

00:26 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the

00:44 ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where

00:49 I'm at mkennedy. Keep up with the show and listen to past episodes at talkpython.fm and follow the

00:54 show on Twitter via at talkpython. This episode is brought to you by brilliant.org and us. Before

01:01 we talk with Carolyn, a quick announcement. Two of the courses that we've released in early access

01:05 mode are now complete, 100% done, and they're ready for you. That's the Python memory management and

01:11 tips course and moving from Excel to Python with pandas and Jupyter. Just visit talkpython.fm and

01:18 click on Python courses to learn more. Carolyn, welcome to Talk Python to Me.

01:24 Thank you. Thank you for having me.

01:26 I'm really interested to hear about this topic. I ran across one of your presentations talking about

01:32 how AI is affecting journalism, and I just was really fascinated by all of these ways that newspapers and

01:40 journalists are doing really cool stuff with like machine learning and AI, and oftentimes that means

01:45 Python as well. Now, before we get into our main topic with AI and journalism, let's just start with

01:52 your story. How did you get interested in programming? You started out on the journalism side, not the AI

01:57 side, right? Yes, exactly. So I studied journalism in university, and that was my focus. My focus was

02:05 actually print journalism. I really thought that was going to be... I mean, okay, I knew it wasn't going to be the

02:11 future, but I really... my skills were in writing, so I thought, okay, newspapers, solid. They're not going

02:17 anywhere. It's fine. So that was kind of my specialty in a way. I started out in sports journalism and then

02:26 kind of moved around after I graduated. Right after I graduated, I moved to Berlin, and I realized that

02:32 being a journalist is very hard, and being a journalist in a country that you don't speak the language is even

02:39 more difficult. Yeah. So yeah, it took me a while, but I figured that out. And so then I started covering

02:47 things like when I first moved to Berlin, it was the refugee crisis or in the midst of the refugee crisis

02:53 in 2015. So I was able to get a few freelance pieces covering that. I also was able to do more

03:00 tech-related. So I started doing a bit of activism fused with tech articles, things like anti-harassment

03:09 tools, like two-way sex toys, things like that. So because in Berlin, there's a big... Yeah, in Berlin,

03:17 there's a big tech and startup scene here. So that was like an English-speaking community that I had access

03:23 to. And eventually, I needed to get a full-time job to keep my visa. So I went into... This is long,

03:31 but I went into tech marketing, went to technical writing, then learned to code. Now I'm a developer.

03:36 What an interesting journey. I think a lot of people find their way into working with software

03:41 in a roundabout way like that. Like for me, it was, oh, I'm going to go study chemistry and math

03:45 at college. And I guess I got to learn a little programming so that I can do the math work and the

03:50 math research. And wait a minute, I actually like this better than the math. What am I doing here?

03:54 That was me as a technical writer because I was writing these tutorials, but I didn't really know

03:58 how to code. So then I would just hand off... It would be like, put code here. I'd hand it to a

04:04 developer. And finally, I was like, I can do that. Write code.

04:08 That's awesome. Yeah. And you're having fun doing programming these days?

04:12 I mean, it's a great way to make a living. And I think the problem-solving aspect of it,

04:18 but I do, I really miss journalism. Like I would be lying. That's why I like to research

04:22 topics like this because it helps me feel a bit more connected.

04:26 Stay connected. Well, I do think that there's a lot of ways in which journalists can use tech

04:32 or could be helped by folks with tech skills. So who knows? We may find you back in the journalism

04:38 space on the tech side of the desk.

04:41 That's the dream.

04:42 Yeah. Awesome. Awesome.

04:44 So what are you doing these days, like day to day right now?

04:47 So right now I'm actually dipping back into technical writing a little bit. So I'm doing

04:53 the Google season of docs, which is a three month program from Google where you're partnered

04:58 with an open source organization. So I'm working with the GraphQL foundation and yeah. So not

05:04 really using a spoiler alert, not really using Python day to day, not really a proficient Python

05:11 developer. But in my previous job, I was using Python because we were an automated testing service.

05:18 And a lot of our, you know, suites were written in Python. We had a data team, a data science team

05:24 that primarily written Python, anything that wasn't written in Python, we usually had some sort of like

05:30 Porter so that people could write it in Python.

05:33 Right. Exactly.

05:34 You know, compiled into PureScript or something like that. So.

05:38 Yeah. Very, very cool. Let's start pre AI talking about journalism and just talk about data and

05:46 journalism. Now, I feel like these things have always gone together. You know, if you went back to

05:50 like 1920 and you grabbed a newspaper, it would probably have stuff about the stock market and here

05:55 the trends and whatnot. But accessing data has become much, much easier in the last 10 years or whatever.

06:04 When we have web scraping, we have APIs, we have all these different ways of accessing data, right?

06:09 The internet was massive in that regard and so on. So when I think of data internalism, probably the

06:16 first place that I think of is 538.com. Like that place just has so many GitHub repositories of all

06:24 the data that they use and that you go there and there's all these graphs and stuff. But you know,

06:28 maybe just give us a sense of like where you see data having an impact in journalism these days.

06:33 You know, it's funny that you jump right away to GitHub and these data sets, because for me,

06:38 I think about it and maybe it's because, you know, I studied it. And when you study something in

06:42 university, you get all of the philosophy behind it. Right. Yeah. But, but I think about data and

06:49 journalism, as it's always been a really integral part of journalism, like most really good quality

06:56 reporting has an element of data to it. I think about things like I mentioned, I used to do sports

07:02 reporting. And you think about a story like that. And would you rather read something that says,

07:08 the teams played well, this person did pretty okay? Like, seems like last time we think. Yes,

07:14 exactly. Or would you rather like read an article that really breaks down the statistics, you know,

07:20 what the score was, what the batting average was, I don't know why I'm using baseball, but

07:25 so this idea of data and especially like really well researched and well curated data,

07:33 it's great because it can help like, do things like fight misinformation and it can help. There's

07:39 a really great quote from Catherine Gashiro from, she's from the International Center for Journalists

07:46 Knight fellow. And yeah, she said that like data can help journalists speak truth to power. And I love

07:52 that because I think when you have data in reporting, it's instantly more trustworthy. We can dive into the

07:59 ethics of that and whether or not that data is actually trustworthy later, but it gives people

08:04 that sense of practice. It's probably more trustworthy than just, it's my opinion, or here's an antidote.

08:10 I heard on Facebook that somebody said that this was true. So here's my, you know, things like that,

08:15 right? Definitely. You know, one of the big ironies it feels like to me is with the, all this availability of

08:23 real data we have, it seems like a proliferation of just insanity around fake data that I don't know,

08:32 maybe it's because it's also on the internet and it appears like, well, here's some piece of information

08:37 and we'll call those facts or opinions. And here's another one that's different than that, but they're

08:41 both on the same webpage, right? They're both in the browser. And like that kind of puts on equal footing

08:46 as opposed to, well, that used to be on the front page of the New York times and the front page of the

08:51 require, like I didn't consider those to be equally weighted sources, but maybe people are just not

08:57 distinguishing them. But it seems ironic to me that we have more access to real data and it, there's also

09:03 seems to be a lack of embracing of real data or whatever. I'm not trying to put that, but people

09:09 seem a little wacky. Yeah. I think there is, it's confusing. I mean, I also think it's confusing.

09:14 We're living in this world where someone can be, you know, blogging on their own site and that can be

09:20 almost more credible journalism than, you know, someone in certain news outlets. So we don't need

09:28 to get super political in this, but it is. And, but yeah, it's, I think it's really confusing and I

09:35 don't blame people who especially don't, might not understand how the data is collected or, you know,

09:41 this, a lot of people aren't very transparent about how it's being presented or collected. And

09:46 yeah, it's confusing.

09:48 It's definitely confusing, but it's also the foundation of the real journalism, the real

09:54 reporting. I think there's a lot of interesting ways in which you use it. And we're going to talk

09:58 about some actually concrete, cool tools that a lot of newspapers are using, but you know, maybe let's

10:04 just talk about why would journalists and newspaper journalists, freelance journalists,

10:11 associated press type things and so on. Like why would those folks use AI and ML? Like why are they

10:18 adopting these tools?

10:18 There are a lot of reasons. And just so you don't have to hear it from me, there was a recent survey

10:25 done by journalism AI, which is from Google and Polis, which is from the London School of Economics and

10:32 Political Science. They have a think tank on this and they surveyed newsrooms across the nation. You'll

10:38 probably, or sorry, across the world. And you will hear me reference this survey a lot because it's

10:43 really thorough and it's really recent. They published it at the end of last year. And they

10:48 mentioned that there were like three key motives for using AI. So the first was to make journalists

10:55 work more efficient, about like 68% of the replies said that. Then to deliver more relevant content to

11:04 users, about half of the respondents said that, and also to improve business efficiency. So what I've

11:10 mostly focused on is that element of making journalists work more efficient, because there is so much that you do

11:19 in journalism that can be automated. And you know, we're developers, we love to automate things.

11:25 Once you realize you're like, I don't have to do this for four hours by hand. Okay, we're not doing this for

11:32 four hours by hand. Let me do it once for two hours, and we'll never do it again, right?

11:36 Exactly. So there's so many opportunities like that, especially because traditional print journalism is

11:41 such, you know, a very thorough, very logical, but, you know, a bit slow moving in that sense. And

11:47 especially with this quicker news cycle, you need to keep up. So there are things like, you know, being

11:52 able to retrieve one of those massive data sets and comb through it and see whether or not there's a story

11:58 in there. Fact checking basic articles, maybe organizing story ideas that are facilitated from

12:05 the public, making initial rough cuts of videos or deciding what camera angle is the best. And there's

12:12 just a lot of tedious tasks. Yeah, I used to be an unpaid intern. I know what those tasks are, because I had

12:17 to do them. And I think it's great, because, and I think the reason I focused on that, is that it's really

12:25 supplementary to human journalists. Right? Yeah, none of the stuff that you said, people would want to

12:31 defend as like, that AI is taking my job. Like, I used to go by hand, and rename these columns to that

12:38 columns, and then merge it over here. And I want to keep doing like nobody wants to keep doing that,

12:43 right? They want to do it to tell the stories to find the insights to do the research, not juggle and

12:49 wrangle data, or other things along those lines, right? Exactly. But I think when a lot of journalists are

12:55 being presented the idea of AI and machine learning, they're not presented, like, no one says exactly

13:01 what you said, they just say, Oh, we're going to introduce this new bot, we're going to introduce

13:04 this new tool. And I think people immediately are very afraid, you know, very unsure about, you know,

13:12 you hear about robots are going to take all of our jobs. And so I think they get a bit uneasy in that

13:18 sense. And I don't blame them, because if it's not being explained in a way that like, Oh, this is

13:23 supplementary to your work. And that's what most of the respondents in the survey even said, they said,

13:28 like, they see AI as something that is, you know, supplementary and additional, not necessarily like

13:36 transformational yet.

13:40 This portion of Talk Python to me is brought to you by Brilliant.org. Brilliant has digestible courses

13:45 in topics from the basics of scientific thinking all the way up to high end science like quantum

13:51 computing. And while quantum computing may sound complicated, Brilliant makes complex learning

13:56 uncomplicated and fun. It's super easy to get started. And they've got so many science and math

14:00 courses to choose from. I recently used Brilliant to get into rocket science for an upcoming episode.

14:05 And it was a blast. The interactive courses are presented in a clean and accessible way. And you

14:10 could go from knowing nothing about a topic to having a deep understanding. Put your spare time to good

14:16 use and hugely improve your critical thinking skills. Go to talkpython.fm/brilliant and sign up for

14:21 free. The first 200 people that use that link get 20% off the premium subscription. That's talkpython.fm

14:28 slash brilliant. Or just click the link in the show notes.

14:31 Who knows where the future goes? I've never ceased to be amazed by how crazy some of the things people

14:40 are coming up with. Like the fact that we have self-driving cars, that seemed like pure science

14:45 fiction, you know? So who knows? Maybe we'll get creepy AIs writing stuff. But for now, I don't see it

14:52 that way. And I remember working at a company quite a while ago where it was not a very tech heavy company.

14:57 It was like a research place, but a lot of people were researchers, not developers. So there were not a

15:02 lot of automated systems. And every time we would say, you know, that thing used to do for like four

15:07 hours a week, we just made that automatic and it just happens now. They're like, ah, this is like,

15:13 that's what I used to do. This is going to be my job. But every single time they just got more

15:17 interesting work that was less tedious, you know? And I don't remember anyone getting laid off or

15:23 anything like that because we automated stuff. It's just, we could do more work and do more

15:27 interesting work. Exactly. I think there'll always need to be that human component, especially

15:30 in journalism because it's so based on things like storytelling and, you know, conveying emotion.

15:36 I mean, not every form of journalism, but a lot of the really human centric things focus on that.

15:42 And I think you'll always need someone.

15:44 Yeah. Well, a really interesting semi-recent, I'm going to call it recent story,

15:49 was people applying like text processing and machine learning to the data that came out of the Panama

15:56 papers, right? So I don't know much about it.

15:59 Yeah, I didn't either. But I recently interviewed a guy who worked on a project that was a search engine

16:05 type of thing. And as they were working on it, there was a journalist who was super interested

16:11 in the tech. They're like, why is this journalist so interested in this? And it turned out that they were

16:16 using it to analyze and search all the data, like do OCR and types of stuff like that on the data that

16:22 came out of the Panama papers to create relationships and see what, you know, like that kind of stuff

16:27 would have taken a lot more people, maybe would have gotten exposed before the Panama papers could have

16:32 been, you know, analyzed fully, all sorts of stuff. So yeah, a lot of interesting ways to use technology,

16:37 I think.

16:37 Absolutely.

16:38 All these are interesting. But honestly, I think maybe the biggest boost actually might be just the

16:43 automate the boring stuff story, which you were touching on is like, there's all these things

16:48 you got to do, a lot of them are tedious. And if you had a little programming skill, you could automate

16:53 them. But because you don't, you just maybe use Excel and find and replace or something painful.

16:58 And if you could just take like 20% of the tedious work away from journalists and let them

17:03 focus on the story, or getting out and getting the story, like that would be great.

17:07 Exactly. I mean, then I think you can focus on all of those aspects that a computer cannot.

17:14 Like what you just said, I literally just repeated what you said.

17:16 Obviously, the computer can't go and interview people in a way that's going to ask the right questions and so on. Right?

17:23 Exactly. So in your talk, you mentioned a bunch of newsrooms that are actually using

17:29 different software applications, libraries, and so on. Some of these are open source. Some of these

17:35 are just talked about, but I thought they were fascinating. So you want to go through some of

17:40 those?

17:40 Yeah, definitely.

17:41 All right, let's start with the Washington Post and Heliograph.

17:44 Heliograph is a bot from Washington Post that produces content for them. I mean, does a lot of other

17:51 things, you know, like tweets, its own code snippets, identifies trends in the stock market.

17:55 But at least as someone from the outside, it seems like that's the value that it brings is that it

18:01 just rapidly generates articles. Like the first year it came out, it made over 800 articles in a single

18:08 year. Yeah, exactly. And it's really good for at least from what I've seen is that it's really good

18:15 for those kind of short, quick reports. So things like covering the Olympic Games, like in Rio, it was

18:21 able to just pop out like 300 articles. Then for things like politics in the election, that's I think

18:29 the main focus of it. They won an award for it in 2016, or for the coverage in 2016. But just that volume,

18:38 that is, I mean, I don't know a human who can write, what is it, like five articles a day? I have no idea.

18:45 Yeah, the turnaround time as well, right? Like, it seems to me like this kind of thing would be good

18:51 for, oh, there's a big crash during rush hour that is shut down I-5 North. Details coming. Or

18:57 something like, you know, just like those really short little, this thing has happened. We're going to

19:02 write a story eventually, but we want to get it out there because we know it's a timely sort of thing,

19:06 right? Definitely. And I think it also ties into the relational issue you're talking about with the

19:11 Panama Papers. I think that's where, based on what I've read, where they're kind of going with

19:17 heliograph, especially in politics, because, you know, a human reporter, there are tens of thousands

19:23 of elections going on throughout the year, especially just in the US I'm talking about.

19:27 Yeah.

19:28 And let alone the entire world. And so being able to have something that can monitor all of those,

19:35 and perhaps maybe even eventually find relationships between those, I think is really exciting.

19:41 Yeah, absolutely. Also just alerts on like emerging trends, like this thing seems to be getting talked

19:50 about on Twitter and we've pulled it out and maybe this hashtag is now all of a sudden trending.

19:54 Hey, reporters, look at this and see if this is interesting. And, you know, maybe it would let

19:59 them know they should pay attention sooner rather than later about this thing that's coming up.

20:03 Yeah. And that's actually a really good segue to like the next example that I had that was from

20:08 Forbes. So they have a CMS system called Birdie. And the reason I said it's a good segue is because

20:16 part of that CMS has exactly that. It has the hashtags, you know, they recommend maybe you write

20:21 articles or when you put an article together, it can read through it and say like, ah, this is trending,

20:26 like put this hashtag on it when you post it or tweet about it. And it looks, at least from what I've

20:32 seen in the videos, it looks pretty much the same to any sort of, you know, WordPress contentful,

20:38 like whatever you have there, but it just has these extra features. So when you go to add an image,

20:44 it'll have suggested images that are related to the article that you have, or, you know, you'll write a

20:51 headline and it'll say, ah, okay, if you switch these around, it'll be, you know, this percentage more

20:58 click, click worthy. I don't know if that's actually the term I use, but.

21:02 But that's the idea, right? Is that it's, it attracts, it's more shared, like more likely to

21:07 be shared. People are more likely to click on it when they see it in some kind of feed scrolling by

21:11 and so on.

21:12 Exactly. And I mean, even topics like SEO, like accounting for like, what is click worthy,

21:17 also what will get you up there. And so, or at least for certain topics.

21:23 Yeah. And so all that sounds like, oh, we're trying to just take the content and make it more,

21:28 viral, have a higher viral potential, but it also has more concrete things like reading complexity.

21:36 Yes, exactly. Because I think one of the difficult parts of journalism that I think

21:41 not a lot of, I don't know, maybe I'm wrong, but I feel like a lot of people misunderstand about

21:46 journalism is when you're writing, especially for, you know, a newspaper that's supposed to be

21:51 accessible to everyone. You need to be writing at like sixth grade level. Yeah.

21:57 So sixth to eighth grade is what I was told in university. And that's difficult if, you know,

22:05 writing is what you do for a living and you're used to trying to string together these beautiful

22:10 prose. And so getting things really down to that plain language, short sentences, and having something

22:16 to assist with that, I think, yeah, game changer, my opinion.

22:21 Right. Well, absolutely. And I'm sure people could look down on it and like, well, you're trying to

22:25 dumb down your article, but at the same time, if you reach more people, if the message gets across

22:30 to more people, like that's the ultimate goal is to convey the information to people reading it and

22:36 get more people to read it. So it seems like a noble thing to do.

22:39 Absolutely. It's making information more accessible. I mean, we even have this problem in the software

22:44 community with technical writing, you know, people are, it's like, oh, if I make all of my,

22:50 if I make all my documentation, like really short sentences, it sounds so whatever. And you're like,

22:55 but there are so many people who maybe English isn't their first language, or, you know, maybe

23:00 they have a cognitive disability, or maybe they just, you know, it's a lot to read and take on.

23:05 So making your language more simple helps everyone.

23:09 Even if you're really good at reading, you're trying to juggle two things in your mind. You're

23:13 trying to juggle the programming ideas and what the lesson is teaching you. So you're already kind of

23:17 splitting your, your mental capacity. So it, it feels to me like it should go down. And I'm always

23:23 amazed and sympathetic to folks who English is their second language, because all the programming

23:29 keywords are in English, which it just seems like really a little unfair, but you know,

23:34 such as life, I guess, but the documentation, obviously the easier, the better in that space.

23:39 I also think it's a different, you know, you have to think about it in different contexts. I mean,

23:43 I think journalism can be art. No one's arguing that, but I do think that the primary purpose of

23:50 most journalism is to convey information. So being able to get that as succinct and clear as possible

23:57 is, I would say the goal.

23:58 Yeah. Well, and there's always the Mark Twain quote of, sorry, I wrote you a long letter. I didn't have

24:04 time to write a short one.

24:05 Exactly.

24:07 Right. All right. Let's talk about earthquakes, West coast.

24:10 Earthquakes. Yes. So it's another bot.

24:15 Actually, I'm hoping there are no earthquakes, but let's talk about reporting on earthquakes.

24:18 Reporting. Apparently there are a lot. That's what I learned from this bot. But the LA Times

24:24 has also a bot. I swear, not every AI implementation in journalism is a bot, but they are, in my opinion,

24:32 some of the most interesting. And it's similar to heliograph, but it specifically focuses on earthquakes.

24:41 And it is exactly what we're talking about as far as the turnaround time being so quick. So for example,

24:48 like back in 2014, there was this earthquake that hit the LA area and the LA Times was the first to

24:55 report on it because basically the earthquake happened. The reporter woke up, just went to his computer

25:03 computer and reviewed the article and published it within three minutes, like three minutes after it

25:09 happened.

25:10 That is incredible. A lot of people are still trying to figure out, was that an earthquake? What just

25:14 happened? They're like, I published this. This is good.

25:16 Exactly. And it's because it was sitting there waiting for him because he has a, like, QuakeBot is

25:23 connected to the US Geological Survey. And so when an earthquake comes in above a certain,

25:29 he has, he said, the programmer who did this set different parameters. So it's within this area of LA,

25:36 it has this size threshold. I think there's a few other ones, but those are the main ones I remember.

25:41 And then he was able to like extract the data that was sent and throw it into a pre-written template

25:50 in their CMS.

25:51 Anyway, last night there was an earthquake at this time, this magnitude, you know, it was for this long

25:57 and yeah, just go up there and review it and hit go. That's pretty cool. There's also a gist where you

26:03 can go and the person who was working on it or wrote it talked a little bit about how it works and so on,

26:07 right?

26:08 Yes, exactly. There's a bit, it doesn't have like the full working code, but there are a little bit of

26:13 like code snippets to be able to see like what parameters were put on it. So yeah, it's very cool.

26:19 Yeah. So I'll link to that in the show notes. Let's stick with LA and, you know, I was thinking

26:23 earthquakes were kind of, it made people nervous. This next one definitely will make people nervous,

26:27 right?

26:28 Yes. I mean, it makes me nervous, but it's the reality of it. So the LA Times also has a homicide report

26:37 and it takes all of the information that is given to them. I don't know exactly what the original

26:45 data resources. I would assume the LAPD or something similar. And it plots all of the homicides onto a

26:54 like interactive map on their website. So you can sort it and filter it by, you know, the year, it has the

27:02 name, the gender, and a few other areas that I'm not exactly remembering. But any information you would want

27:09 on this homicide, you can find on this map.

27:12 Yeah. And it's much like the Quake bot, it just automatically receives that information

27:16 and pull it up. Okay. Yeah, cool. A good service for a sad thing, I suppose. This next one is

27:22 interesting, comes out of the Guardian Australia.

27:25 Yes. And this is maybe my favorite of the bots, for only the reason that it is entirely open source,

27:33 which I think is.

27:34 Yeah, that's pretty cool.

27:35 Yeah, that's very cool.

27:36 So it's called ReporterMate. And it pretty much does the exact same things we were just talking

27:44 about. But instead of, no, actually, it pretty much does the same things. It reports on Australian

27:52 election coverage, I think a few like weather related things, stock market. So it does like very similar

28:00 work, especially to Heliograph, but it is open source. And the open source tool, in and of itself

28:07 is pretty cool. It uses, I this is a Python library or package that I do know pandas.

28:14 Uh huh. Nice.

28:16 Yes. So it uses that it uses handlebars, and a bunch of helper functions.

28:23 Yeah. And it looks really cool. Like you just pip install it off you go. And yeah, it gives you all

28:28 that automation, which that's kind of an interesting contrast back to like the Washington Post, doing

28:35 its own sort of private thing versus here's the open source equivalent that a bunch of people can jump

28:39 on. And it seems to me like this would be really helpful for smaller newsrooms, you know,

28:45 small town city that doesn't really have like lots of money and maybe doesn't have much of a,

28:50 they might have an IT person, but not a, like a software development team.

28:54 Oh yeah. That's a big, like when we talk about this field, that's something I definitely,

28:58 I don't know if you want to talk about now or we can talk about it later, but that's.

29:01 Let's talk about a little bit. So there's this, I suppose there's this really big challenge between

29:06 like take the top 10 newspaper organizations and they have paywalls, they have tech teams,

29:11 they have mobile apps. They're like, they are a tech company in some aspect versus a news organization

29:18 for a town that's got a hundred thousand people like Lawrence, Kansas, where I went to college,

29:22 right? Like they, maybe that's a bad example. That's actually where Django came from, but,

29:26 but in general, like these smaller newspapers don't necessarily have big tech teams.

29:31 Yeah, no, for sure. They, so what a lot of people talk about is that AI and journalism is already a

29:38 significant part, but it's really unevenly distributed. So it's focused on these big news

29:45 organizations that either have their own development team or have a relationship to these big tech

29:52 companies. And I find the topic really interesting because it dives, it gets a little bit into the

30:00 ethics field because what a lot of people, at least according to the survey that I mentioned earlier,

30:05 what a lot of people are afraid of. And I didn't even think about until reading this survey, which is

30:11 unfortunate is that people are afraid of like these big tech companies having more power,

30:17 like not only fueling the power of these big tech companies like Google, for example,

30:22 but also how that would potentially impact the reporting of those big tech companies. You know,

30:29 it's kind of this cycle.

30:31 Journalism is a check on the power of those tech companies, right? And if-

30:35 Exactly.

30:36 Yeah. Do you really want to write a negative article about them? Like what's going to happen to your

30:41 other articles if you become too much of a negative force on them, right?

30:46 Exactly. And I mean, they hold a lot of power in this situation because I mean, a lot of these

30:52 news organizations, specifically the ones that don't have their own development team or have a very small

30:57 development team, really rely on that technology to keep up and kind of keep that pace in this

31:04 online, like very much online, very much 24, 24 hour news cycle. So it's scary when you think about it like

31:12 that it really is a little bit. Talk Python to me is partially supported by our training courses.

31:18 How does your team keep their Python skills sharp? How do you make sure new hires get started fast and

31:24 learn the Pythonic way? If the answer is a series of boring videos that don't inspire or a subscription

31:31 service you pay way too much for and use way too little, listen up. At Talk Python Training, we have

31:36 enterprise tiers for all of our courses. Get just the one course you need for your team with full reporting,

31:42 and monitoring or ditch that unused subscription for our course bundles, which include all the courses

31:47 and you pay about the same price as a subscription once. For details, visit training.talkpython.fm

31:54 slash business or just email sales at talkpython.fm.

32:00 The first place that comes to mind when I think about like those challenges, it's got to be Facebook.

32:04 But I want to ask you a question, not about Facebook, about something else.

32:07 Google News.

32:08 So you're in Europe right now and Europe has had a mixed, interesting relationship with Google News.

32:15 I feel like, you know, I think Spain had tried to like prohibit or charge Google for like putting the headlines from Spanish newspapers.

32:25 I think it was Spain.

32:26 It was somewhere in Europe.

32:28 And they tried to limit how Google News could sort of use their free like headlines.

32:33 And so they just stopped it.

32:34 And then they're like, wait, wait, wait.

32:36 Where'd all our traffic go?

32:37 Bring back Google News.

32:37 We need Google News again.

32:38 What is going on here?

32:39 You know, I feel like there's something like that happening right now in Australia as well.

32:44 So there always seems to be like this tension of like, oh, we hate them.

32:47 They're like robbing from us.

32:48 Wait, we need them.

32:49 They're our savior.

32:50 Bring them back.

32:51 You know, what's your thought from being more on the inside of that world?

32:54 I'll be honest.

32:55 I haven't been in as many discussions that discuss like Google News specifically.

33:00 But something I do, I have talked about with my friends and I actually saw a Twitter thread about it today.

33:05 I'll send you the link.

33:06 Awesome.

33:07 And what's about how, I mean, this thread didn't mention Google specifically, but come on, has to do with it, is that like how much American media like seeps into European news coverage and what people are aware of.

33:25 And I could imagine also if you're seeking your news online or through something like Google, naturally then a lot of American politics comes up, a lot of American systems come up where you're kind of fed this idea.

33:39 And in a way that is so unique to the U.S.

33:42 Because at least like growing, I grew up in the U.S.

33:46 So I don't remember ever being like, wow, why do we have so much news on any other country?

33:52 I remember if you wanted news on other.

33:55 Why does Brazil always give us their news?

33:56 Like, I just don't need Brazil's news.

33:58 Like, no offense to Brazil.

33:58 I'm just grabbing like a random country that doesn't do that generally.

34:01 Exactly.

34:02 Yeah.

34:03 I remember like if you wanted news, like, for example, my sister studied in Japan and before she left, she wanted, you know, to try to keep up with Japanese news.

34:11 She had to literally go buy the Japanese paper from the actual Japanese store that we had.

34:17 So you really have to seek out that news versus like a lot of American news is just kind of, what's the word, like filtered into the everyday experience of people in Europe.

34:29 And it's kind of like, why?

34:31 And I mean, even, I don't know this personally, but I know there's some even tension between like Western Europe and Eastern Europe as far as like how their news is represented, who, how much do you see of each country?

34:43 How much do you really know?

34:44 Right.

34:45 And you're in Berlin, which is like right on that, that line there in historically speaking.

34:50 But it's definitely still in like the more Western category.

34:53 I would say we hear a lot more about, you know, France or the UK than we do about Bulgaria.

34:59 Yeah.

35:00 Very interesting.

35:01 Just an example.

35:02 One of my favorite songs is Californication from the Red Hot Chili Peppers.

35:07 And that song's all about like how America is exporting their culture and stuff.

35:12 I think generally through music and Hollywood, it's only more so now, right?

35:16 With like the tech companies and online and so on.

35:19 Interesting.

35:20 I know.

35:20 I think there's this tension between both the big tech companies just controlling.

35:25 They're the aggregators of the attention.

35:28 So they're controlling access to what gets attention.

35:31 And then there's this tension between the big newspapers and the small newspapers, right?

35:36 Because the big newspapers have software teams that can just go, yeah, yeah, PyTorch.

35:39 Let's use that.

35:40 And other people are like, what?

35:42 Is there a fire?

35:42 What is this torch about?

35:43 Yeah.

35:44 Yeah.

35:44 So, yeah.

35:46 Very interesting.

35:47 Let's keep going on some of these things.

35:48 The next one that I thought was interesting that you brought up was what ProPublica was doing around analyzing not what people in the U.S. Congress say they're interested in, but what their actions and behaviors and words say they're interested in.

36:02 Yeah.

36:03 So ProPublica took an analysis of thousands of press releases over the course of two years.

36:10 And they trained a computer model to extract which phrases each Congress member uses most frequently.

36:17 And then under the assumption that if these are the phrases they're using most frequently, these are likely the topics that they are pushing for and care about the most.

36:27 Because, or else, why would you be releasing all this press release about it if it's not a topic?

36:32 And some of them, a lot, like some of them were in line with them.

36:36 I don't have the article up right now, but I would suggest checking it out.

36:40 But some of it, the Congress people were really in line with what their beliefs were.

36:45 And a lot of them, it was like, we say this.

36:47 Yes.

36:47 But we don't really agree.

36:49 Yes, exactly.

36:50 Yeah.

36:51 This is what we say we're for and this is what we're actually for.

36:53 How interesting.

36:54 One of the things I was thinking of that's like a cool automate the boring stuff when you're interested in those kinds of things is like speech to text.

37:02 A lot of times you'll be some kind of presentation, like there's a video of the person, but it would be much better if you could just index the keywords of what they said.

37:11 And, you know, the ability to just like take spoken word and video and turn it into written stuff that you can analyze.

37:18 It seems like that'd be pretty interesting in journalism.

37:20 Yes.

37:21 I can't believe I forgot to mention that because I think that's the most tedious part of reporting.

37:25 I remember having to either, you know, have my little hand recorder and record things and then type it out myself later or just take handwritten notes and be like, I hope I got this quote right.

37:35 So I think the idea that you can, I mean, there are apps like right now I use Otter.

37:41 Yeah.

37:42 I like an instant.

37:43 Yeah.

37:43 Yeah.

37:44 I have not had any practical use for Otter, but I've tried to use it for the podcast and live.

37:49 It's like, it'll transcribe multi-person conversations and attribute the spoken word to the different people and so on.

37:56 Right?

37:56 Yes, exactly.

37:57 You still need to definitely read through it because especially if you're, I can imagine, especially in software, we're using a lot of lingo that isn't common speak.

38:05 You have to go through it.

38:35 version of word will now let you like, I heard this just yesterday or the day before you can take an MP3 and upload it to your document and then just grab paragraphs of transcoded text and just drop them into your document, like right out of the, the MP3 file, which is, that's pretty awesome.

38:51 That'll help a lot of people.

38:52 Yeah, that's great.

38:53 And it also goes along with what we were saying about readability where it's again, an accessibility issue where, you know, a lot of before it was always like, ah, it's so hard to have a video have captions or like, oh, it's really hard.

39:04 If we recorded this interview to have, you know, a written version because it takes so much power or work time.

39:11 And now it's like, there's almost no excuse to make things inaccessible because the resources are there and a lot of them are free and available.

39:18 Yeah.

39:19 Yeah.

39:20 It's super cool.

39:21 So the next one that you talked about was BuzzFeed.

39:24 And to me, BuzzFeed is like listicle type stuff.

39:28 And it's when you think of viral headlines, like they probably got some things that like really recommend headlines.

39:34 That said, this next thing, they do do real news reporting as well in some interesting ways.

39:39 And this next one actually is pretty interesting there, right?

39:42 Yeah.

39:43 BuzzFeed.

39:43 I love a good BuzzFeed quiz, but they also just, they have real BuzzFeed news, really good journalism.

39:51 And this one is, I remember reading about it and I couldn't believe it because it just sounds like sci-fi movie, but they trained a computer model to find and track like secret airplanes.

40:04 So what I mean by that is the computer used a machine algorithm sift for planes with flight patterns that resembled those of the FBI or the Department of Homeland Security.

40:15 Like the plane that goes up and just flies in circles around a city rather than from a city to a city, right?

40:21 Something like that.

40:22 Exactly.

40:22 Yeah.

40:23 I don't know.

40:24 Yeah.

40:24 I could totally imagine it's exactly like that.

40:26 Something a little bit strange in that way.

40:30 But it allowed them to report on a ton of different topics that, again, I just think are wild.

40:36 So like, for example, like how U.S. marshals hunted down drug cartel kingpins in Mexico.

40:44 Like how?

40:45 I don't know.

40:46 I don't know.

40:46 How there was like a military contractor that tracks terrorists in Africa, but I guess they were flying over U.S. cities.

40:53 Right.

40:54 Wait, if your job is to track, you know, military stuff in some foreign country, what are you doing in Dallas?

41:01 Flying around.

41:02 Yeah, exactly.

41:03 That's suspicious.

41:03 And just other topics around like aerial surveillance, which is, again, when I read it, I was like, you know, I think of when I think of aerial surveillance, I think of like deep conspiracy theories.

41:15 Like what are they called?

41:16 Like jet trails?

41:17 Yeah.

41:18 Con trails.

41:18 Yeah.

41:19 Con trails.

41:20 So that's always what I thought of.

41:23 So then when I saw this report, I was like, oh my God, like they did it.

41:28 Yeah.

41:29 Super cool.

41:30 This is really interesting and well done there.

41:32 So I guess the last one I want to talk about is probably the most far out one, which is what if we could have a drone fly a robot in to walk around dangerous places like war zones and investigate human like like a humanoid robot, but not a person.

41:50 Yeah, that's from Al Jazeera.

41:52 And they mentioned it at one of their future media leaders summit in 2018.

41:57 It's still from what I can tell, it's still in development and a bit far out.

42:03 It is far out.

42:04 But I think it's really interesting.

42:06 Like the idea that, you know, we're so used to or at least if you look up like drones and war zones, that's a common practice is to have some sort of drone that goes in.

42:16 What it does, you know, depends on who's using it.

42:19 But the idea with this is that it would deploy a robot.

42:23 Yeah.

42:23 That can take video.

42:25 It can surveil what's going on.

42:26 It can, you know, record the sounds that are happening.

42:29 They want it to be able to dodge sniper attacks and assess the situation.

42:36 And what I think is really interesting about that is that, I mean, there is like a lot of human journalists, one, aren't trained for that type of environment.

42:46 And it would be very difficult to train someone in that.

42:49 There are journalists embedded within military.

42:52 So like the U.S. military has journalists.

42:55 But again, it goes into the issue of how much of it is true or how much of it is influenced by their employer.

43:05 You're certainly getting one perspective if you're like with those folks.

43:09 I know folks try to be objective, but they're there for your safety and you can only go and be with them where they are.

43:18 Right.

43:19 So even like no matter whether right or wrong, like you're getting a somewhat influenced perspective from that.

43:25 Right.

43:25 Exactly.

43:26 You can't just walk around like, well, let me go talk to those guys over there and see what they think.

43:30 No, they're shooting me.

43:31 I'm going to not do that.

43:33 Exactly.

43:33 Exactly.

43:33 And I mean, there's also the ethics of sending, you know, civilian human journalists into these kind of hostile spaces and being like, here, report on it.

43:41 I mean, there are crisis reporters and you hear about journalists being captured in places like Yemen.

43:46 But again, it's like an ethics debate of when should that be required and when should it not.

43:54 Yeah, for sure.

43:55 So, yeah, this is a really interesting idea of taking legitimately taking a humanoid drone or robot and having it walk around in war zones.

44:05 So they actually have a YouTube video or a video on YouTube that you can see a little animation of when it works.

44:10 I guess a couple other things we could talk about.

44:12 So this is a few tools that we talked about are open source and people can use.

44:16 But a lot of this is kind of like how news organizations are using this technology on their platforms.

44:20 But there's also some tools that people can use, like Quartz AI Studio and Google News Initiative and stuff like that.

44:28 You want to give us a quick rundown of some things that people can use?

44:30 Yeah, definitely.

44:31 So as you mentioned, there's Quartz AI Studio.

44:33 It's from the Knight Foundation, which is an esteemed organization in journalism.

44:39 And they help journalists, like trained journalists to use machine learning in their reporting and also can provide support and tools.

44:48 And I think that's great because it makes these practices more accessible to these smaller news organizations or even freelance journalists.

44:56 I don't know the exact requirements for what it takes to get their support.

45:00 But again, the fact that they're even offering this access to these smaller organizations, I think, is great.

45:08 And Google, we've mentioned a few times, they do a lot of research in regards to I mean, they're the ones from the survey I keep quoting.

45:15 They do a lot of interesting research on this topic.

45:18 So, for example, they have facets, which is a machine learning data visualization tool.

45:23 And it's open source.

45:24 So you can play with data within it and create visualizations of the information.

45:30 And then finally, there's like Google News Initiative and more specifically journalism AI and Google News Initiative.

45:38 It goes back.

45:40 Everything is basically a circle.

45:42 And it goes back to what we were talking about earlier in regards to these big tech companies and then who provides access to these tools, but also who teaches journalists how to use these tools.

45:56 So the biggest resource that I know is for training journalists for what is machine learning, how to use it in reporting, what are the different tools available?

46:06 Here's an introduction.

46:07 Is Google News Initiative.

46:09 They have about like 40 courses that are available to journalists.

46:12 But again, it's kind of, of course, it leans towards, hey, use our Google products to do these things.

46:18 And it also leads to a, I'm trying to figure out a way to word this, not like a one-way information direction, but like kind of a funnel of like, okay, this is your only information about how to, you know, think about data, how to analyze algorithms.

46:34 But I'm a big fan of like tech is pretty much always biased.

46:39 It's always political.

46:40 So that influence from Google has to be apparent in there somewhere.

46:45 Right.

46:45 And it might not necessarily be overtly intentional, right?

46:49 It could just be the people who built it all have, they generally share one way of you in the world.

46:53 And so that's probably going to show up in there somehow.

46:56 A hundred percent.

46:57 Like algorithm bias, especially, is something that also came up in this survey about something that people are afraid of and something that people are nervous about implementing.

47:08 People being like reporters are nervous about implementing in their work because, you know, if you don't know how to analyze an algorithm and know where it's getting its data and knowing where, how that data is being prioritized, then it's difficult to know.

47:22 Like, am I presenting data that is really reputable and as unbiased as it can be?

47:29 Or if it is biased, what are the biases?

47:31 Right.

47:32 And so they don't want to be publishing like blatantly biased reporting unless they do.

47:37 But I think a lot of people don't ever intend to have that happen.

47:40 Right.

47:40 There's different levels, right?

47:42 Like it could be that you're using an algorithm and it's giving you information and then you're writing something based on that influenced or directed or biased information.

47:54 Or it could be something as simple as like the bot that tells me what's trending.

47:57 It's always interested more in this other part of society than maybe what actually is more important to the most part of society.

48:04 Like it could be it really cares about people in New York and their financial behaviors.

48:09 Or it could be it cares about the plight of middle of the challenges of middle of America.

48:14 Or it could be racially biased.

48:16 There's all sorts of things that it could be.

48:18 And it's not like the algorithm is so incredibly biased.

48:20 It's just like says, hey, you should pay attention to this aspect of life rather than that.

48:26 Right.

48:26 Like that could be really subtle, I think, and challenging.

48:28 Definitely.

48:29 I mean, like for a lot of the information where I was talking about where people get these big data sets and sift through it.

48:36 I mean, that data set that you get could be biased if you don't know how it was collected.

48:41 And for a lot of people, getting a big data set like that is really exciting.

48:46 It's like, oh, OK, like I don't have to go through the work of surveying thousands of people.

48:50 And so it's really appealing, I think, as a reporter to want to act on that.

48:54 Yeah.

48:55 And yeah, there's also just so we don't go down a total rabbit hole.

49:00 There's also Mozilla, although in recent news of Mozilla, I'm not sure how much of this is still in effect.

49:09 But they have a history of partnering with journalists and news organizations.

49:13 So there's a organization called Open News, and it's a network of developers, journalists, designers, editors.

49:21 And they collaborate on open technologies and processes within journalism.

49:25 And that is its own organization now.

49:29 But it was originally incubated within Mozilla.

49:31 And they also have other other ones like the Mozilla Information Trust Initiative,

49:37 which is a collection of comprehensive efforts to keep the Internet credible and healthy and fight misinformation.

49:45 And also, they just announced like a few days ago.

49:51 I'm fact checking myself as we speak.

49:55 The Mozilla Foundation announced that there was going to be a new fund for black artists that examines the relationship between AI and racial justice.

50:05 Okay.

50:06 Yeah, very cool.

50:06 So not maybe that isn't directly related to reporting, but I think a lot of those outcomes can directly influence reporting,

50:15 especially around news coverage, around topics like Black Lives Matter and racial injustice and the justice system in general in the U.S.

50:24 Yeah, for sure.

50:25 Yeah, I think Mozilla is definitely a pretty positive force for these things compared to a lot of the tech companies.

50:30 That's great.

50:31 All right.

50:32 Well, there are so many more things I would like to ask you.

50:35 But I have on my list to talk to you about.

50:38 But at the same time, we're running short on time.

50:41 So let me ask you just one question about kind of what you're up to these days, being a full circle.

50:47 I know that you're looking for maybe your next project, your next thing to be working on and doing.

50:53 Do you want to tell people like what you're interested in, if they've got an opportunity for you out there?

50:57 Yeah, I'm looking for a new role.

51:00 So this Google Season of Docs program runs until December.

51:04 And then after that, I'm hoping to start something new.

51:08 I'm a front-end developer by trade, been a front-end developer for about two years.

51:12 So I've been mostly looking at roles in that.

51:16 But of course, I would love to get back into journalism and tech reporting or even go in from the engineering side.

51:24 So if anyone knows anything, my inbox is fully open for you.

51:30 Awesome.

51:31 And I'll be sure to put your contact information in the show notes so people can get in touch with you.

51:35 Great.

51:36 Yeah, very cool.

51:37 All right.

51:37 Now, before we get out of here, I'm going to ask you the final two questions.

51:40 If you're going to write some code, what code editor do you use?

51:44 I use VS Code because I do a lot of TypeScript.

51:46 So it has the best TypeScript support in my opinion.

51:50 Yeah, cool, cool.

51:51 And, you know, it's written in TypeScript, so it better have good TypeScript support.

51:55 Awesome.

51:56 You basically just follow the squiggly line until you find the error.

51:59 Yeah, perfect.

52:00 And then I always like to bring up like some interesting Python library package for folks out there.

52:06 So what do you got for us this week?

52:07 Got one that's interesting to you?

52:09 Yes.

52:09 There's a package that I was very, very recently introduced to.

52:14 Newspaper 3K.

52:16 And it's like an article scraping and curation package.

52:20 So you can take a URL from an article that is somewhere on the interweb and it will scrape

52:29 it and try to find information from it.

52:31 Like, for example, like the author, the publishing date, some of the text, top images, et cetera.

52:37 Oh, my gosh.

52:37 This is, yeah, this thing is super cool.

52:39 I've heard of this before, but I think it's the perfect fit for what we're talking about

52:43 today.

52:44 Its features include multi-threaded article download framework, news URL identification.

52:50 And I think it'll even do things like you point it at like a landing page, like the homepage

52:55 of a newspaper, and it'll find all the sub articles and stuff.

52:57 Yeah, super cool.

52:58 So if you're into researching news, you want to do web scraping, you might not have to start

53:02 from like low level programming with beautiful soup.

53:04 You could just get more of the direct data here.

53:07 Yeah, great one.

53:08 All right.

53:09 Final call to action.

53:10 You know, speaking to the folks who work somehow with the journalism industry, they want to get

53:15 code and technology more into what they're doing.

53:18 What would you tell them?

53:18 I would tell them that it's really a great opportunity to look into.

53:22 It's something that I really believe is the future of the industry is the future of information

53:29 and reporting.

53:30 But I think I would definitely approach it with caution.

53:34 So make sure that if you're if you are someone who is either going to be building these algorithms or you're going to be using them to make sure that you're asking the right questions about where the data comes from, where is it being prioritized?

53:47 And in general, beyond people who are going to be using them to ask questions, be skeptical, and just be aware that the story that you're reading might be generated by a bot or an algorithm.

54:12 Yeah, good advice.

54:13 Well, Carolyn, thank you so much for being on the show.

54:15 It's a fascinating look inside into the journalism industry and tech intersection.

54:20 Yeah.

54:21 Thank you so much for having me.

54:22 I love this topic.

54:24 So yeah, it's very interesting.

54:26 You bet.

54:26 Bye bye.

54:27 This has been another episode of Talk Python to Me.

54:30 Our guest in this episode was Carolyn Stransky, and it's been brought to you by Brilliant.org and Talk Python Training.

54:36 Brilliant.org encourages you to level up your analytical skills and knowledge.

54:40 Visit talkpython.fm/brilliant and get Brilliant Premium to learn something new every day.

54:46 Want to level up your Python?

54:49 If you're just getting started, try my Python Jumpstart by Building 10 Apps course.

54:53 Or if you're looking for something more advanced, check out our new async course that digs into all the different types of async programming you can do in Python.

55:02 And of course, if you're interested in more than one of these, be sure to check out our Everything Bundle.

55:06 It's like a subscription that never expires.

55:08 Be sure to subscribe to the show.

55:10 Open your favorite podcatcher and search for Python.

55:13 We should be right at the top.

55:14 This is your host, Michael Kennedy.

55:25 Thanks so much for listening.

55:27 I really appreciate it.

55:28 Now get out there and write some Python code.

55:43 I really appreciate it.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon