#280: Python and AI in Journalism Transcript
00:00 there's ever been a time in history that journalism is needed to shine a light on what's happening in the world. It's now Would it surprise you to hear that Python and machine learning are playing an increasingly important role in discovering and bringing us the news. On this episode, you'll meet Carolyn Stransky, a journalist and developer who's been researching this intersection of tech and journalism. This is taught by fun to me, Episode 280, recorded August 26 2020.
00:39 Welcome to talk Python to me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm at m Kennedy. Keep up with the show and listen to past episodes at talk python.fm. And follow the show on Twitter via at talk Python. This episode is brought to you by brilliant.org and us. Before we talk with Carolyn, a quick announcement, two of the courses that we've released in early access mode are now complete. 100% done and they're ready for you. That's the Python memory management and tips course and moving from Excel to Python with pandas and Jupyter. Just visit talk python.fm and click on Python courses to learn more.
01:22 Carolyn, welcome to talk Python to me. Thank you. Thank you for having me. I'm really interested to hear about this topic, I ran across one of your presentations, talking about how AI is affecting journalism. And I just was really fascinated by all these ways that newspapers and journalists are doing really cool stuff with like machine learning and AI. And oftentimes that means Python as well. Now, before we get into our main topic with AI in journalism, let's just start with your story. How did you get interested in programming? You started out on the journalism side, not the AI side, right? Yes, exactly. So I studied journalism and university. And that was my focus. My focus was actually print journalism. I really thought that was going to be I mean, okay, I knew it wasn't going to be the future. But I really my skills were in writing. So I thought, okay, newspapers, solid, they're not going anywhere. It's fine. So that's was kind of my specialty in a way. I started out in sports journalism, and then kind of moved around after I graduated. Right after I graduated, I moved to Berlin, and I realized that being a journalist is very hard. And being a journalist in a country that you don't speak, the language is even more difficult. Yeah. So yeah, it took me a while, but I figured that out. And so then I started covering things like when I first moved to Berlin, it was the refugee crisis, or in the midst of the refugee crisis in 2015. So I was able to get a few freelance pieces covering that I also was able to do more tech related. So I started doing a bit of like activism fused with tech articles, things like anti harassment tools, like to weigh sex, toys, things like that. So because there's a big Yeah, in Berlin, there's a big tech and startup scene here. So that was like an English speaking community that I had access to. And eventually, I needed to get a full time job to keep my visa. So I went into this is long, but I went into tech marketing, went to technical writing, then learn to code. Now I'm a developer, what an interesting journey. I think a lot of people find their way into working with software, in a roundabout way like that, like for me, it was all I'm gonna go study chemistry and math at college. And I guess I got to learn a little programming so that I can do the math work and the math research. And wait a minute, I actually like this better than the math. What am I doing here? That was me as a technical writer, because I was writing these like tutorials, but I didn't really know how to code. So then I would just hand off like, it would be like put code here. I hand it to a developer. And finally, I was like, I could do that. write code. That's awesome. Yeah. And you're having fun doing programming these days? I mean, it's a great way to make a living. And I think, the problem solving aspect of it, but I do I really miss journalism, like I would be mine. That's why I like to research topics like this, because it helps me feel a bit more connected. Stay connected. Well, yeah, I do think that there's a lot of ways in which journalists can use tech, or could be helped by folks with tech skills. So who knows? We may find you back in the journalism space on the on the tech side of the desk. That's the dream. Yeah. Awesome. So what are you doing these days, like day to day right now. So right now, I'm actually dipping back into technical writing a little bit. So I'm doing the Google season of docs, which is a three month program from Google where you're partnered with an open source organization.
05:00 So I'm working with the graph qL foundation. And yeah, so not really using a spoiler alert, not really using Python day to day, not really a proficient Python developer. But in my previous job, I was using Python because we were in automated testing service. And a lot of our, you know, sweets were written in Python, we have a data team, a data science team that primarily written Python, anything that wasn't written in Python, we usually had some sort of like Porter so that people could write it in Python. Right, exactly, you know, compiled into pure script or something like that. So yeah, very, very cool. Let's start pre AI, talking about journalism, and just talk about data and journalism. Now, I feel like these things have always gone together, you know, if you went back to, like, 1920, and you grabbed a newspaper, it would probably have stuff about the stock market and hear the trends and, and whatnot. But accessing data has become much, much easier. In the last 10 years or whatever, when we have web scraping, we have API's, we have all these different ways of accessing data, right? The Internet was massive in that regard, and so on. So when I think of data, internalism, probably the first place that I think of is 530 eight.com. Like that place just has so many GitHub repositories of all the data that they use, and they go there, and there's all these graphs and stuff. But you know, maybe just give us a sense of like, where you see data having an impact in journalism these days. You know, it's funny that you jump right away to GitHub and these data sets, because for me, I think about it. And maybe it's because, you know, I studied it, and when you study something in university, you get all of the philosophy behind it. Right? Yeah. But, but I think about data and journalism, as it's always been a really integral part of journalism, like, most really good quality reporting has an element of data to it. I think about things like I mentioned, I used to do sports reporting. And you think about a story like that. And would you rather read something that says, The team's played? Well, this person did pretty okay. Like seems like last time we think Yes, exactly. Or would you rather have like read an article that really breaks down the statistics, you know, what the score was, what that batting average was? I don't know why I'm using baseball. But so this idea of data and especially like, really well researched, and well, curated data is great, because it can help, like, do things like fight misinformation, and it can help. There's a really great quote from Katherine Kushiro from, she's from the International Center for journalists night fellow. And yeah, she said that, like data can help journalists speak truth to power. And I love that. Because I think when you have data and reporting is instantly more trustworthy, we can dive into the ethics of that, and whether or not that data is actually trustworthy later, but it gives people that sense of it's probably more trustworthy, then just, it's my opinion, or here's an antidote. I heard on Facebook that somebody said that this was true. So here's my, you know, things like that. Right.
08:16 Definitely. You know, one of the big ironies it feels like to me is, with the, all this availability of real data, we have, it seems like a proliferation of just insanity around fake data that, I don't know, maybe it's because it's also on the internet into Peters like, Well, here's some piece of information. And we'll call those facts or opinions. And here's another one that's different than that. But they're both on the same webpage. Right? They're both in the browser. And like, that kind of puts on equal footing, as opposed to well, that used to be on the front page of the New York Times and the front page of the Enquirer. Like I didn't consider those to be equally weighted sources. But maybe people are just not distinguishing them. But it seems ironic to me that we have more access to real data. And it there's also seems to be a lack of embrace in a real data or whatever. I'm not trying to put that but people seem a little wacky. Yeah, I think there is. It's confusing. I mean, I also think it's confusing. We're living in this world where someone can be, you know, blogging on their own site, and that can be almost more credible journalism than, you know, someone in certain news outlets. So we don't need to get super political in this, but it is. And but yeah, it's I think it's really confusing. And I don't blame people who especially don't might not understand how the data is collected, or, you know, this. A lot of people are very transparent about how it's being presented or collected. And, yeah,
09:48 it's definitely confusing, but it's also the foundation of the real journalism, the real reporting. I think there's a lot of interesting ways in which use it we're going to talk about some actually content
10:00 To create cool tools that a lot of newspapers are using, but maybe let's just talk about why would journalists and newspaper journalists, freelance journalists, Associated Press type things and so on? Like, why would those folks use AI and ml? Like, why are they adopting the tools? There are a lot of reasons. And just so you don't have to hear it from me, there was a recent survey done by journalism AI, which is from Google, and polis, which is from the London School of Economics and Political Science, they have a think tank on this. And they surveyed newsrooms across the nation, you'll probably or sorry, across the world. And you will hear me reference this survey a lot, because it's really thorough, and it's really recent, they published it at the end of last year. And they mentioned that there were like, three key motives for using AI. So the first was to make journalists work more efficient, about like 68% of the replies that that then to deliver more relevant content to users, about half of the respondents said that, and also to improve business efficiency. So what I've mostly focused on is that element of making journalists work more efficient, because there is so much that you do in journalism that can be automated. And, you know, we're developers, we love to automate Exactly. Once you realize you're like, I don't have to do this for four hours by hand. Okay, we're not doing this for four hours by hand, let me do it once for two hours, and we'll never do it again. Right, exactly. So there's so many opportunities like that, especially because traditional print journalism is such, you know, a very thorough, very logical, but you know, a bit slow moving in that sense, and especially with this quicker news cycle, you need to keep up. So there are things like, you know, being able to retrieve one of those massive data sets and comb through it and see whether or not there's a story in there fact checking basic articles, maybe organizing story ideas that are facilitated from the public, making initial rough cuts of videos or deciding what camera angle is the best. And there's just a lot of tedious tasks. Yeah, I used to be an unpaid intern. I know what those are, because I had to do them. And I think it's great, because, and I think the reason I focused on that is that it's really supplementary to human journalists, right? Yeah, none of the stuff that you said people would want to defend is like that AI is taken my job, like, I used to go by hand, and rename these columns to that columns, and then merge it over here. And I want to keep doing like nobody wants to keep doing that. Right. They want to do it to tell the stories to find the insights to do the research, not juggle and wrangle data or other things along those lines. Right, exactly. But I think when a lot of journalists are being presented the idea of AI and machine learning, they're not presented like no one says exactly what you said, they just say, oh, we're going to introduce this new bot, we're going to introduce this new tool. And I think people immediately are very afraid, you know, very unsure about, you know, you hear about robots are going to take all of our jobs. And so I think they get a bit uneasy in that sense. And I don't blame them. Because if it's not being explained in a way that like, Oh, this is supplementary to your work. And that's what most of the respondents in the survey even said they said like they see AI as something that is, you know, supplementary and additional not necessarily like transformational. Yes.
13:40 This portion of talk by enemy is brought to you by brilliant.org. Brilliant has digestible courses in topics from the basics of scientific thinking all the way up to high end science, like quantum computing. And while quantum computing may sound complicated, brilliant, makes complex learning uncomplicated, and fun. It's super easy to get started. And they've got so many science and math courses to choose from. I recently use brilliant to get into rocket science for an upcoming episode. And it was a blast. The interactive courses are presented in a clean and accessible way. And you could go from knowing nothing about a topic to having a deep understanding. Put your spare time to good use and hugely improve your critical thinking skills. Go to talkpython.fm/ brilliant and sign up for free. The first 200 people that use that link, get 20% off the premium subscription, that's talkpython.fm/ brilliant, or just click the link in the show notes.
14:33 The who knows where the future goes, I've never cease to be amazed by how crazy some of the things people are coming up with like the fact that we have self driving cars that seemed like pure science fiction, you know, so who knows, maybe we'll get creepy AI is writing stuff. But for now I don't see it that way. And I remember working at a company quite a while ago where it was not a very tech heavy company. It was a research place but
15:00 A lot of people were researchers not developer. So there were not a lot of automated systems. And every time we would say, you know, that thing used to do for, like, four hours a week, we just made that automatic and it just happens. Now they're like, Ah, this is like, that's what I used to do, this is gonna be my job. But every single time, they just got more interesting work that was less tedious. You know, and I don't remember anyone getting laid off or anything like that, because we automated stuff. It's just we could do more work and do more interesting work. Exactly. I think there'll always need to be that human component, especially in journalism, because it's so based on things like storytelling, and you know, conveying emotion, I mean, not every form of journalism, but a lot of the really human centric things. Yeah, focus on that. And I think you'll always need someone. Yeah, well, a really interesting, semi recent, I'm gonna call it recent story was people applying, like text processing and machine learning to the data that came out of the Panama Papers? Write so much about it? Yeah, I didn't read it. But I recently interviewed a guy who worked on a project that was a search engine type of thing. And as they were working on it, there was a journalist who was super interested in the tech, they're like, why is this journalist so interested in this, and it turned out that they were using it to analyze and search all the data, like, do OCR and types of stuff like that, on the data that came out of the Panama Papers to create relationships and see what and you know, like, that kind of stuff would have taken a lot more people maybe would have gotten exposed before the Panama Papers could have been, you know, analyzed fully, all sorts of stuff. So yeah, a lot of interesting ways to use technology. I think absolutely. All these are interesting. But honestly, I think maybe the biggest boost actually might be just the automate the boring stuff. Story, which you were touching on is like, there's all these things you got to do. A lot of them are tedious. And if you had a little programming skill, you could automate them, but because you don't, you just maybe use Excel and find and replace or something painful. And if you could just take like 20% of the tedious work away from journalists and let them focus on the story, or getting out and getting the story like that would be great. Exactly. I mean, then I think you can focus on all of those aspects that a computer cannot, like, what you
17:16 do is, obviously the computer can't go and interview people in a way that's going to ask the right questions and so on. Right, exactly. So in your talk, you mentioned a bunch of newsrooms that are actually using different software, applications, libraries, and so on. Some of these are open source. Some of these are just talked about, but I thought they were fascinating. So you want to go through some of those? Yeah, definitely. All right, let's start with the Washington Post. And heliograph heliograph is a bot from washington post that produces content for them. I mean, it does a lot of other things, you know, like tweets, its own code snippets identifies trends in the stock market. But at least as someone from the outside, it seems like that's the value that it brings is that it just rapidly generates articles, like the first year it came out, it made over 800 articles in a single year. Yeah, exactly. And it's really good for at least from what I've seen, is that it's really good for those kind of short, quick reports. So things like car in the Olympic Games, like in Rio, it was able to just pop out like 300 articles, then for things like politics and the election, that's I think, the main focus of it, they won an award for 16, or the coverage in Tennessee. But just that volume, that is, I mean, I don't know a human who can write, what is it, like five articles a day, I have the turnaround time as well, right? Like, it seems to me like this kind of thing. It'd be good for, oh, there's a big crash during rush hour that is shut down. I five North details coming or something like you know, just like those really short little, this thing has happened. We're gonna write a story eventually. But we want to get it out there because we know it's a timely sort of thing. Right, definitely. And I think it also ties into the relational issue you're talking about with the Panama Papers. I think that's where based on what I've read, where they're kind of going with heliograph, especially in politics, because you know, a human reporter, there are 10s of thousands of elections going on throughout the year, especially just in the US, I'm talking about Yeah, and let alone the entire world. And so being able to have something that can monitor all of those, and perhaps maybe even eventually find relationships between those I think, is really exciting. Yeah, absolutely. Also just alert on me, like emerging trends. Like this thing seems to be getting talked about on Twitter, and we've pulled it out and maybe this hashtag is now all of a sudden trending. Hey, reporters, look at this and see if this is interesting and you maybe would let them know that
20:00 You should pay attention sooner rather than later about this thing that's coming up. Yeah. And that's actually a really good segue to like, the next example that I had that was from Forbes. So they have a CMS system called a birdie. And the reason I said it's a good segue is because part of that CMS has exactly that it has the hashtags, you know, they recommend maybe write articles, or when you put an article together, it can read through it and say, like, Oh, this is trending, like, put this hashtag on, and when you post it, or tweet about it, and it looks, at least from what I've seen, in the videos, it looks pretty much the same to any sort of, you know, WordPress, contentful, like, whatever you have there. But it just has these extra features. So when you go to add an image, it'll have suggested images that are related to the article that you have, or, you know, you'll write a headline, and it'll say, Ah, okay, if you switch these around, it'll be you know, this percentage more Click, click worthy. I don't know.
21:02 But but that's the idea, right? is that it's, it attracts, it's more shared, like more likely to be shared, people are more likely to click on it when they see it in some kind of feed, scrolling by and so on. Exactly. And I mean, even topics like SEO, like accounting for like, what is click worthy, but also what will get you up there. And so are these for certain topics. Yeah. And so all that sounds like, oh, we're trying to just take the content and make it more viral, have a higher viral potential. But it also has more concrete things like reading complexity. Yes, exactly. Because I think one of the difficult parts of journalism that I think not a lot of, I don't know, maybe I'm wrong, but I feel like a lot of people misunderstand about journalism is, when you're writing, especially for, you know, a newspaper that's supposed to be accessible to everyone, you need to be writing at like, six grade level. Yes. So, six to eighth grade, eighth grade is what I was told in university. And that's difficult if you know, writing is what you do for a living. And you're used to trying to string together these beautiful prose. And so getting things really down to that plain language, short sentences and having something to assist with that. I think, yeah, Game Changer might write well, absolutely. And I'm sure people could look down on like, well, you're trying to dumb down your article. But at the same time, if you reach more people, if the message gets across to more people, like that's the ultimate goal is to convey the information to people read it and get more people to read it. So it seems like a noble thing to do. Absolutely. It's making information more accessible. I mean, we even have this problem in the software community with technical writing, you know, people are, it's like, Oh, my, I make all my documentation, like really short sentences, it sounds so whatever, and you're like, but there are so many people who maybe English isn't their first language, or, you know, maybe they have a cognitive disability, or maybe they just, you know, it's a lot to read and take on. So making your language more simple, helps everyone, even if you're really good at reading, you're trying to juggle two things, your mind you're trying to juggle the programming ideas, and what the lesson is teaching you. So you're already kind of splitting your, your mental capacity. So it it feels to me like it should go down. And I'm always amazed and sympathetic to folks who English is their second language, because all the programming keywords are in English, which it just seems like really a little unfair, but you know, such as life, I guess. But the documentation, obviously, the easier the better in that space. I also think it's a different, you know, you have to think about it in different contexts. I mean, I think journalism can be art. No one's arguing that. But I do think that the primary purpose of most journalism is to convey information. So being able to get that as distinct and clear as possible is, I would say the goal. Yeah. Well, there's always the Mark Twain quote of Sorry, I wrote a long letter. I didn't have time to write a short one. Exactly. Right. All right. Let's talk about earthquakes. West Coast, earthquakes. Yes. So it's another bot. Actually, I'm hoping there are no earthquakes. But let's talk about reporting on earthquakes. Reporting. Apparently, there are a lot of what I learned from this, but but the LA Times has also a bot. I swear. Not every AI implementation in journalism is a bop but they are, in my opinion, some of the most interesting
24:35 and is similar to heliograph. But it specifically focuses on earthquakes. And it is exactly what we're talking about as far as the turnaround time being so quick. So for example, like back in 2014, there was this earthquake that hit the LA area. And the LA Times was the first to report on it because basically, the earthquake happened
25:00 And the reporter woke up, just went to his computer and reviewed the article and published it within three minutes, like three minutes after it happened. That is incredible. A lot of people are able to still trying to figure out was that an earthquake? What just happened? And they're like I published this, this is good. Exactly. And it's because it was sitting there waiting for him because he has a like quake bot is connected to the US Geological Survey. And so when an earthquake comes in above a certain he has, he said the programmer who did this set different parameters. So it's within this area of LA, it has this size threshold, I think there's a few other ones, but those are the main ones that I remember. And then he was able to like extract the data that was sent and throw it into a pre written template in their CMS, right. Last night, there was an earthquake at this time, this magnitude, you know, it was for this long. And yeah, just go out there and review it and heco. That's pretty cool. There's also just where you can go and the person who was working on a road, it talked a little bit about how it works and so on, right? Yes, exactly. There's a bit. It doesn't have like the full working code, but they're a little bit of like code snippets to be able to see like what Yeah, parameters were put on it. So yeah. Yeah. So I'll link to that in the show notes. Let's stick with LA. And you know, I've seen earthquakes were kind of could make people nervous. This next one definitely will make people nervous. Right? Yeah. And it makes me nervous, but it's the reality of it. So the LA Times also has a homicide report. And it takes all of the information that is given to them. I don't know exactly what the original data resources, I would assume the LAPD or something similar. Yeah. And it plots all of the homicides onto a like interactive map on their website. So you can sort it and filter it by you know, the year it has the name, the gender, and a few other areas that I'm not exactly remembering. But any information you would want on this homicide you're gonna find on this map. Yeah, and it's much like the quake bot, it just automatically receives that information and pull it up. Okay, yeah, cool. A good service for a sad thing. I suppose this next one is interesting comes out of the Guardian, Australia. Yes. And this is maybe my favorite of the bots, for only the reason that it is entirely open source, which I think is Yeah, yeah. That's pretty cool. So it's called reporter mate. And it pretty much does the exact same things we were just talking about, but instead of No, actually, it pretty much does the same thing. It reports on Australian election coverage, I think, if you like weather related things stock market. So it does like very similar work, especially to heliograph. But it is open source and the open source tool, in and of itself is pretty cool. It uses I this is a Python library or package that I do know pandas. Haha. Nice. Yes. So it uses that eases handlebars and a bunch of helper functions. Yeah. And it looks really cool. Like you just pip install it off you go. And yeah, it gives you all that automation, which that's kind of an interesting contrast back to like the Washington Post, doing its own sort of private thing versus here's the open source equivalent that a bunch of people can jump on. And it seems to me like this would be really helpful for smaller newsrooms, you know, small town city that doesn't really have like, lots of money, maybe doesn't have much of a, they might have an IT person, but not a like a software development team. Oh, yeah, that's a big thing. When we talk about this field, that's something I definitely I don't know if you want talk right now, or we can talk about it later. But that's, let's talk about a little bit. So there's this, I suppose there's this really big challenge between like, take the top 10 newspaper organizations, and they have paywalls. They have tech teams, they have mobile apps, they're like, they are a tech company, and in some aspect, versus a news organization for a town that's got 100,000 people like Lawrence, Kansas, where I went to college, right? Like they maybe that's a bad example. That's actually where Django came from. But
29:27 But in general, like these smaller newspapers don't necessarily have big tech teams. Yeah, no, for sure. They. So what a lot of people talk about is that AI in journalism is already a significant part. But it's really unevenly distributed. So it's focused on these big news organizations that either have their own development team or have a relationship to these big tech companies. And I find the topic really interesting because it dives it gets a little bit into
30:00 The ethics field because what a lot of people, at least according to the survey that I mentioned earlier, what a lot of people are afraid of, and I didn't even think about until reading this survey, which is unfortunate, is that people are afraid of like these big tech companies having more power, like not only fueling the power of these big tech companies like Google, for example, but also how that would potentially impact the reporting of those big tech companies, you know, it's kind of this cycle is a check on the power of those tech companies. Right? And exactly, yeah, do you really want to write a negative article about them now? Like, what's gonna happen to your other articles, if you are become too much of a negative force on them? Right, exactly. And I mean, they hold a lot of power in the situation, because I mean, a lot of these news organizations, specifically the ones that don't have their own development team, or have a very small development team, really rely on that technology to keep up and kind of keep the pace in this online, like very much online very much 2424 hour news cycles. So it's scary when you think about it like that, but it really is a little bit.
31:15 Talk Python, to me is partially supported by our training courses. How does your team keep their Python skills sharp? How do you make sure new hire to get started fast and learn the pythonic? way? If the answer is a series of boring videos that don't inspire, or a subscription service you pay way too much for and use way too little. Listen up. At Talk Python Training, we have enterprise tiers for all of our courses, get just the one course you need for your team with full reporting, and monitoring, or ditch that unused subscription for our course bundles, which include all the courses and you pay about the same price as his subscription. Once For details, visit training, talkpython.fm/ business or just email sales at talk python.fm.
32:00 The first place that comes to mind when I think about like those challenges, it's got to be Facebook. But I want to ask you a question not about Facebook about something else. Google News. So you're in Europe right now. and Europe has had a mixed interesting relationship with Google News. I feel like, you know, I think Spain had tried to, like prohibit or charge Google for like putting the headlines from Spanish newspapers think it was span somewhere in Europe. And they, they tried to limit how Google News could sort of use their free like headlines. And so they just stopped it. And then they're like, wait, wait, wait, where'd all our traffic Go bring back Google News. We need Google News. Again, what is going on here? You know, you know, I feel like there's something like that happening right now in Australia, as well, like, so there always seems to be like this tension of like, Oh, we hate them. They're like robbing from us. Wait, we need them. They're our Savior bringing them back. You know, like, what's your thought from being more on the inside of that world? I'll be honest, I haven't been as in as many discussions that discuss, like Google News, specifically, but something I do I have talked about with my friends. And I actually thought Twitter thought about it today, I'll send you the link. And was about how I mean, this thread didn't mention Google specifically. But come on has to do with it. Is that like how much American media like seeps into European news coverage and what people are aware of and I could imagine, also, if you're seeking your news online, or through something like Google, naturally, then a lot of American politics comes up a lot of American systems, where you're kind of fed this idea and in a way that is so unique to the US, because, at least like growing, I grew up in the US, so I don't remember ever being like, wow, why do we have so much news on any other country? I don't want to
33:55 give it us their news. Like I just don't need results, dude, like no fixer uppers just grabbing like a random country that doesn't do that generally. Exactly. Yeah. I remember, like, if you wanted news, like, for example, my sister studied in Japan. And before she left, she wanted, you know, to try to keep up with Japanese news. She had to literally go buy the Japanese paper from the actual Japanese store that we had. So you really have to seek out that news versus like America, a lot of American news is just kind of what's the word like filtered into the everyday experience of people in Europe? And it's kind of like, why and I mean, even I don't know this personally, but I know there's some even tension between like Western Europe and Eastern Europe as far as like, how their news is represented who How much do you see of each country? How much do you really know? Right? You're in Berlin, which is like right on that line there in historically speaking, but it's definitely still in like the more Western category, I would say we hear a lot more about, you know, France or the UK than we do about Bulgaria.
35:00 Yeah,
35:01 just
35:03 one of my favorite songs is Californication from the Red Hot Chili Peppers. And it's all about like how America is exploiting their culture seven, I think generally through music and Hollywood, it's only more so now right with like the tech companies and online and so on. Interesting. I think there's this tension between both the big tech companies just controlling they're the aggregators of the attention. So they're controlling access to what gets attention. And then there's this tension between the big newspapers and the small newspapers, right? Because the big newspapers have software teams that can just go Yeah, yeah, pi torch, let's use that. And other people are like, What? Is there a fire? What is this torch about? Yeah.
35:45 So yeah, very interesting. Let's keep going on some of these things. The next one that I thought was interesting that you brought up was what pro publica was doing around analyzing, not what people in the US Congress say they're interested in, but what their actions and behaviors and words say their address it in? Yeah, so pro publica took an analysis of thousands of press releases over the course of two years. And they trained a computer model to extract like which phrases each Congress member uses most frequently. And then under the assumption that if these are the phrases they using most frequently, these are likely the topics that they are pushing for and care about the most, because Or else, why would you be releasing all this press release about it? If that's not a topic, and some of them a lot like some of them were in line with? I don't have the article up right now. But I would suggest checking it out. But some of it the Congress, period, like people were really in line with what their beliefs were. And a lot of them it was like, you say this? Yes, but we don't really? Yes, exactly. This is what we say we're for. And this is what we're actually for, How interesting. One of the things I was thinking of that's like a cool, automate the boring stuff, when you're interested in those kinds of things is like speech to text. And a lot of times, you'll be some kind of presentation, like there's a video of the person, but it would be much better if you could just index the keywords of what they said. And you know, the ability to just like take spoken word and video and turn it into written stuff that you can analyze. Seems like that'd be pretty interesting in journalism. Yes. I can't believe I forgot to mention that. Because I think that's the most tedious part of reporting. I remember having to either, you know, have my little hand recorder and record things and then type it up myself later. Or just take handwritten notes and be like, I hope I got this quote, right. So I think the idea that you can, I mean, there are apps like right now I use otter, yeah.
37:43 Yeah, not had any practical use for auto, but I've tried to use it for the podcasts and live, it's like, it'll transcribe multi person conversations and attribute the spoken word to the different people and so on. Right? Yes, exactly. You still need to definitely read through it. Because especially if you're, I can imagine, especially in software, where you're using a lot of lingo that isn't common spank, you have to go through it. But things like that are things like I know a lot of newspapers use it for, like translations, like having the idea that you can maybe record a conversation. And not only can it like, instantly transcribe it, but it can instantly translate it into multiple languages. I just think, again, coming from my little hand recorder, I think that is wild. That is super wild. And a word Microsoft Word team just announced that the web version of Word will now let you like I heard this just yesterday or the day before, you can take an mp3 and upload it to your document, and then just grab paragraphs of transcoded text and just drop them into your document right right out of the the mp3 file, which is that's pretty awesome. That'll help a lot of people. Yeah, that's great. And it also goes along with what we were saying about readability where it's again, an accessibility issue where, you know, a lot of before it was always like, Ah, so hard to have a video have captions, or like, Oh, it's really hard if we recorded this interview to have, you know, a written version because it takes so much power, or work time. And now it's like, there's almost no excuse to make things inaccessible because the resources are there, and a lot of them are free and available. So yeah, that's super cool. So the next one that you talked about was BuzzFeed, and to me, BuzzFeed is like listicle type stuff. And it's when you think of viral headlines like they've probably got some things that like really recommend headlines that said, this next thing they do do real news reporting as well in some interesting ways. This next one actually is pretty interesting there, right? Yeah, BuzzFeed, I love a good BuzzFeed quiz. But they also just, they have real BuzzFeed news, really good journalism. And this one is I remember reading about it and I couldn't believe it because it just sounds like sci fi movie, but they trained a computer
40:00 model to find and track like the secret by planes. So what I mean by that is the computer used a machine algorithms sift for planes, but flight patterns that resemble those of the FBI or the Department of Homeland Security, like the plane that goes up and just flies in circles around the city, rather than from a city to a city. Right? Something like that. Exactly. Yeah. I don't know. Yeah, I could totally imagine it's exactly like that. Something a little bit strange in that way. But it allowed them to report on a ton of different topics that again, I just think are wild. So like, for example, like how US Marshals hunted down, drug cartel kingpins in Mexico, like how, I don't know how there was like a military contractor that tracks terrorists and Africa, but I guess they were flying over US cities. Right. Wait, if you Your job is to track yo military stuff in some foreign country. What are you doing in Dallas flying around? Yeah. suspicious. And just other topics around like aerial surveillance, which is, again, when I read it, I was like, you know, I think of when I think of aerial surveillance, I think of like, deed conspiracy theories, like already called the jet trails. Yeah. contrails. Yeah. Yeah.
41:22 So that's always what I thought of. So then when I saw this report, I was like, Oh, my God, like, they did it. Yeah, super cool. This is really interesting. And well done there. So I guess the last one I want to talk about is probably the most far out one, which is, what if we could have a drone fly a robot in to walk around dangerous places like war zones, and investigate human like, like a humanoid robot, but not a person? Yeah, that's from Al Jazeera. And they mentioned it at one of their future media Leaders Summit in 2018. It is still, from what I can tell, it's still in development and a bit far out. It is far out. But I think it's really interesting, like the idea that, you know, we're so used to, or at least if you look up like drones and war zones, that's a common practice is to have some sort of drown that goes in what it does, you know, depends on who's using it.
42:20 But the idea with this is that it would deploy a robot. Yeah, that can take video it can surveil what's going on, it can, you know, record the sounds that are happening? They want it to be able to dodge sniper attacks and assess the situation. And what I think is really interesting about that is that I mean, there is like a lot of human journalists, one aren't trained for that type of environment. And it would be very difficult to train someone in that there are journalists embedded within military. So like the US military has journalists. But again, it goes into the issue of how much of it is true, or how much of it is influenced by their employer, right? You're certainly getting one perspective, if you're like with those folks, I, I know the folks try to be objective, and but they're there for your safety. And you're only you can only go and be with them where they are. Right. So even, like no matter whether right or wrong, like you're getting a somewhat influenced perspective from that. Right, exactly. You can't just walk around like, Well, let me go talk to those guys over there. See what they think. No, they're shooting me for that.
43:32 Exactly. And I mean, there's also the ethics of sending, you know, civilian human journalists into these kind of hostile spaces and being like, Here report on it. I mean, there are crisis reporters, and you hear about journalists being captured in places like Yemen. But yeah, again, it's like an ethics debate of when should that be required? And when should it not? So yeah, for sure. So yeah, this is a really interesting idea of taking, legitimately taking a humanoid drone, or robot and having it walk around in Warzone. So they actually have a YouTube video or a video on YouTube that you can see a little animation of what it works, I guess a couple other things we could talk about. So this is a few tools that we talked about are open source, and people can use it. A lot of this is kind of like how news organizations are using this technology on their platforms. But there's also some tools that people can use like courts, ai studio and Google News initiative and stuff like that. You want to give us the quick rundown of some things that people use. Yeah, definitely. So as you mentioned, there's cords stdio. It's from the Knight Foundation, which is an esteemed organization in journalism. And they help journalists like train journalists to use machine learning in their reporting and also can provide like support and tools. And I think that's great because it makes these practices more accessible to these smaller news organizations or even freelance journalists. I don't know the exact requirements for what it takes to like, get there.
45:00 Support. But again, the fact that they're even offering this access to these smaller organizations, I think is great. And Google, we've mentioned a few times, they do a lot of research in regards to I mean, they're the ones from the survey, I keep quoting, they do a lot of interesting research on this topic. So for example, they have facets, which is a machine learning data visualization tool. And it's open source, so you can play with data within it and create visualizations of the information. And then finally, there's, like Google News initiative, and more specifically, journalism app. And Google News initiative, it goes back, everything is basically a circle. And it goes back to what we were talking about earlier in regards to these big tech companies, and then who provides access to these tools, but also who teaches journalists how to use these tools? So the biggest resource that I know is for training journalists for what is machine learning how to use it in reporting, what are the different tools available, here's an introduction is Google News initiative, they have about like 40 courses that are available to journalists. But again, it's kind of Of course, it leans towards, hey, use our Google products to do these things. And it also leads to a, I try to figure out a way to word this not like a one way information direction, but like kind of a funnel of like, Okay, this is your only information about how to, you know, think about data, how to analyze algorithms, but I'm a big fan of like, tech is pretty much always bias. It's always political. So that influence from Google has to be apparent in there somewhere. So right, it might not necessarily be overtly intentional, right? It could just be Yeah, the people who built they all have, they generally share one way of viewing the world. And so that's probably going to show up in there somehow. 100% like algorithm bias, especially is something that also came up in this survey about something that people are afraid of, and something that people are nervous about implementing people being like reporters, or nervous about implementing in their work, because, you know, if you don't know how to analyze an algorithm and know where it's getting its data and knowing where how that data is being prioritized, then it's difficult to know, like, Am I presenting data that is really reputable. And as unbiased as it can be? Or if it is biased? What are the biases, right? And so they don't want to like be publishing like blatantly biased reporting, unless they do, but I think a lot of people don't ever intend to have that happen, right? There's different levels, right? Like, it could be that you're using an algorithm, and it's giving you information, and then you're writing something based on that influenced or directed or biased information, or it could be something as simple as like the bot that tells me what's trending, it's always interested more in this other part of society than maybe what actually is more important to the most part of society, like it could be in really cares about people in New York and their financial behaviors. Or it could be it cares about the plight of middle of the challenges of Middle America, or it could be racially bias. There's all sorts of things that it could be. And it's not like the algorithm is so incredibly biased. It just like says, Hey, you should pay attention to this aspect of life rather than that, right. Like that could be really subtle. I think and challenging. Definitely. I mean, like, for a lot of the information where I was talking about where people get these big data sets and search through it. I mean, that data set that you get, couldn't be biased if you don't know how it was collected. And for a lot of people, getting a big data set like that is really exciting. It's like, okay, like, I don't have to go through the work of serving thousands of people. And so it's really appealing, I think, as a reporter to want to act on that. Yeah. And yeah, there's also just so we don't go down a total rabbit hole. There's also Mozilla, although, in recent news of Mozilla, I'm not sure how much of this is still, in effect, but they have a history of partnering with journalists and news organizations. So there's a organization called Open news. And it's a network of developers, journalists, designers, editors, and they collaborate on open technologies and processes within journalism. And that is its own organization now, but it was originally incubated within Mozilla, and they also have other other ones like the Mozilla information trust initiative, which is a collection of comprehensive efforts to keep the internet like credible and healthy and fight misinformation. And also, they just announced, like a few days ago. I'm fact checking myself as we speak. The Mozilla Foundation announced that there was going to be a new phone
50:00 For black artists that examines the relationship between AI and racial justice, okay, yeah, very cool. So not maybe that isn't directly related to reporting. But I think a lot of those outcomes can directly influence reporting, especially around news coverage around topics like Black Lives Matter and racial injustice and the justice system in general, in our in, in the US. Yeah, for sure. I think Mozilla is definitely a pretty positive force for these things, compared to a lot of tech companies. That's great. All right. Well, there are so many more things I would like to ask you. But, and I have on my list to talk to you about, but at the same time, we're running short on time. So let me ask you just one question about kind of what you're up to these days, bring it full circle, I know that you're looking for maybe your next project, your next thing to be working on and doing, you want to tell people like what you're interested in, if they've got an opportunity for you out there. Yeah, I'm looking for a new role. So this Google Docs program runs until December. And then after that, I'm hoping to start something new. I'm a front end developer by trade, been a front end developer for about two years. So I've been mostly looking at roles in that. But of course, I would love to get back into journalism and tech reporting, or even go in from the engineering side. So if anyone knows anything, my inbox is fully open for you. Awesome. And I'll be sure to put your contact information in the show notes so people can get in touch with you. Great. Yeah, very cool. All right. Now, before we get out here, I'm gonna ask you the final two questions. If you're gonna write some code, what code editor to use, I use VS Code, because I do a lot of TypeScript.
51:48 has the best TypeScript support in my opinion. Yeah. Cool. Cool. And you know, it's written in TypeScript. So it better.
51:55 Awesome. You basically just follow the squiggly line until you find there. Yeah, perfect. And then I always like to bring up like some interesting Python library package for folks out there. So what do you got for us this week? Got one. That's interesting, too. Yes, there's a package that I was very, very recently introduced to newspaper three K, and it's like an article scraping and curation package. So you can take a URL from an article that is somewhere on the interweb. And it will scrape it and try to find information from it. Like, for example, like the author, the publishing date, some of the text, top images, etc. Oh, my gosh, this is Yeah, this thing is super cool. I've heard this before. But I think it's the perfect fit for what we're talking about today. Its features include multi threaded article, download framework, news URL.
52:50 And I think it'll be things like you pointed at, like a landing page, like the homepage of a newspaper, and it'll find all the sub articles and stuff. Yeah, it's super cool. So if you're into researching news, you want to do web scraping, you might not have to start from like low level programming with beautifulsoup. You could just get more of the direct data here. Yeah, great one. All right, final call to action. You're speaking to the folks who work somehow with the journalism industry. They want to get code and technology more into what they're doing. What would you tell them, I would tell them that it's really a great opportunity to look into is something that I really believe is the future of the industry is the future of information and reporting. But I think I would definitely approach it with caution. So make sure that if you're if you are someone who is either going to be building, these algorithms are going to be using them to make sure that you're asking the right questions about where the data comes from, where is it being prioritized a lot of the things that we've already discussed, and in general, beyond people who work with this day to day, I would say for anyone, it's really important when you're consuming news, and you're consuming information, to ask questions, be skeptical. And just be aware that the story that you're reading might be generated by a bot or an algorithm. Yeah. Good advice. Well, Carolyn, thank you so much for being on the show. It's fascinating to look inside into the journalism industry and tech intersection. Yeah. Thank you so much for having me. I love this topic. So yeah, it's very interesting. You bet. Bye, bye. This has been another episode of talk Python. To me. Our guest in this episode was Carolyn Stransky, and it's been brought to you by brilliant.org and Talk Python Training. Brilliant. org encourages you to level up your analytical skills and knowledge. Visit talkpython.fm/ brilliant and get brilliant premium to learn something new every day. Want to level up your Python. If you're just getting started, try my Python jumpstart by building 10 apps course or if you're looking for something more advanced, check out our new async course the digs into all the different types.
55:00 Have async programming you can do in Python. And of course, if you're interested in more than one of these, be sure to check out our everything bundle. It's like a subscription that never expires. Be sure to subscribe to the show, open your favorite pod catcher and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, the Google Play feed /play in the direct RSS feed at /rss on talk python.fm. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Get out there and write some Python code