WEBVTT

00:00:00.020 --> 00:00:05.200
<v Michael Kennedy>Digital humanities sounds niche until you realize that it can mean a searchable archive of U.S.

00:00:05.420 --> 00:00:11.580
<v Michael Kennedy>amendment proposals, Irish folklore, or pigment science in ancient art. Today I'm talking with

00:00:11.720 --> 00:00:17.780
<v Michael Kennedy>David Flood from Harvard's DARTH team about an unglamorous problem. What happens when the grant

00:00:18.060 --> 00:00:24.180
<v Michael Kennedy>ends? But the website can't. His answer? Static sites, client-side search, and sneaky Python.

00:00:24.600 --> 00:00:30.680
<v Michael Kennedy>Let's dive in. This is Talk Python To Me, episode 538, recorded January 22nd, 2026.

00:00:48.540 --> 00:00:52.880
<v Michael Kennedy>Welcome to Talk Python To Me, the number one Python podcast for developers and data scientists.

00:00:53.180 --> 00:00:54.780
<v Michael Kennedy>This is your host, Michael Kennedy.

00:00:55.130 --> 00:00:58.720
<v Michael Kennedy>I'm a PSF fellow who's been coding for over 25 years.

00:00:59.360 --> 00:01:00.420
<v Michael Kennedy>Let's connect on social media.

00:01:00.800 --> 00:01:03.880
<v Michael Kennedy>You'll find me and Talk Python on Mastodon, BlueSky, and X.

00:01:04.239 --> 00:01:06.060
<v Michael Kennedy>The social links are all in your show notes.

00:01:06.820 --> 00:01:10.340
<v Michael Kennedy>You can find over 10 years of past episodes at talkpython.fm.

00:01:10.520 --> 00:01:13.720
<v Michael Kennedy>And if you want to be part of the show, you can join our recording live streams.

00:01:14.080 --> 00:01:14.540
<v Michael Kennedy>That's right.

00:01:14.730 --> 00:01:18.000
<v Michael Kennedy>We live stream the raw uncut version of each episode on YouTube.

00:01:18.580 --> 00:01:23.020
<v Michael Kennedy>Just visit talkpython.fm/youtube to see the schedule of upcoming events.

00:01:23.180 --> 00:01:26.880
<v Michael Kennedy>Be sure to subscribe there and press the bell so you'll get notified anytime we're recording.

00:01:27.820 --> 00:01:29.480
<v Michael Kennedy>This episode is brought to you by Sentry.

00:01:29.800 --> 00:01:31.040
<v Michael Kennedy>Don't let those errors go unnoticed.

00:01:31.230 --> 00:01:32.840
<v Michael Kennedy>Use Sentry like we do here at Talk Python.

00:01:33.340 --> 00:01:36.200
<v Michael Kennedy>Sign up at talkpython.fm/sentry.

00:01:37.040 --> 00:01:42.320
<v Michael Kennedy>And it's brought to you by CommandBook, a native macOS app that I built that gives long-running

00:01:42.520 --> 00:01:43.980
<v Michael Kennedy>terminal commands a permanent home.

00:01:44.360 --> 00:01:46.360
<v Michael Kennedy>No more juggling six terminal tabs every morning.

00:01:46.820 --> 00:01:51.180
<v Michael Kennedy>Carefully craft a command once, run it forever with auto-restart, URL detection, and a full

00:01:51.300 --> 00:01:51.620
<v Michael Kennedy>CLI.

00:01:51.960 --> 00:01:55.100
<v Michael Kennedy>Download it for free at talkpython.fm/command book app.

00:01:56.040 --> 00:01:59.200
<v Michael Kennedy>Hello, David. Welcome to Talk Python To Me. Amazing to have you here.

00:01:59.760 --> 00:02:03.500
<v David Flood>I'm glad to be here. Talk Python has been part of my story up to this point.

00:02:03.760 --> 00:02:09.220
<v Michael Kennedy>Has it? Okay. Well, you are about to write the next chapter in the story. So that's pretty excellent.

00:02:10.020 --> 00:02:14.800
<v Michael Kennedy>I have a sense of what's coming. We planned out what we're going to talk about and that sort of thing.

00:02:15.200 --> 00:02:20.420
<v Michael Kennedy>And I'm really excited about this topic. So it's going to be a good one.

00:02:21.060 --> 00:02:34.860
<v Michael Kennedy>Honestly, I think one of the real powers of the Python community and the reason the language has such staying power is there's such a diversity of use cases, technology, like technology standpoints, right?

00:02:34.870 --> 00:02:44.740
<v Michael Kennedy>Like I build software for this group or I build these types of apps and it's not just, you know, like Ruby on Rails, which, you know, it's been very popular, but it's, it's for websites, right?

00:02:44.770 --> 00:02:45.280
<v Michael Kennedy>You know what I mean?

00:02:45.920 --> 00:02:46.780
<v David Flood>Yeah, absolutely.

00:02:47.280 --> 00:02:57.260
<v David Flood>I mean, web development has dominated my use of it, but my entry into it, which I suppose I'll mention in a moment, was through all those little tools.

00:02:57.660 --> 00:02:58.320
<v Michael Kennedy>Let's hear it.

00:02:58.930 --> 00:03:00.440
<v Michael Kennedy>Who are you, David Flood?

00:03:00.490 --> 00:03:03.760
<v Michael Kennedy>Tell us, introduce yourself real quick and tell us about how you got into it.

00:03:04.480 --> 00:03:09.140
<v David Flood>So my background is in music and the humanities.

00:03:09.630 --> 00:03:14.900
<v David Flood>I mean, in 2019, I didn't know what Python was or the name of any programming language.

00:03:16.180 --> 00:03:22.280
<v David Flood>and I've been doing textual criticism, which is, you know, there's lots of criticisms in the academy.

00:03:22.860 --> 00:03:26.520
<v David Flood>This is the one where if you have lots and lots of versions of the same text,

00:03:27.100 --> 00:03:33.280
<v David Flood>you are comparing them to work out what the initial text was and like how it changed over time.

00:03:33.760 --> 00:03:35.160
<v Michael Kennedy>Okay, give us an example.

00:03:35.700 --> 00:03:39.800
<v David Flood>Okay, so one of the famous examples, hope I can remember it off the top of my head,

00:03:40.220 --> 00:03:41.600
<v David Flood>is from Shakespeare.

00:03:42.460 --> 00:03:44.500
<v David Flood>We're all familiar with the line to be or not to be.

00:03:45.020 --> 00:03:52.520
<v David Flood>is the question. That is the question. Well, there's a variant of it. One of the early copies

00:03:53.820 --> 00:03:58.900
<v David Flood>written by Shakespeare himself has... Somebody's going to be able to type into the chat exactly

00:03:58.960 --> 00:04:03.780
<v David Flood>what it is. They'll know this anecdote. But it's something more like, "To be or not to be, I."

00:04:04.080 --> 00:04:09.940
<v David Flood>That's the question. And so, which one is the original one? Why did he change it? That's kind

00:04:09.900 --> 00:04:14.900
<v David Flood>of one example i work mainly in the in the new testament which is especially complicated because

00:04:15.360 --> 00:04:23.300
<v David Flood>no other corpus from ancient history has as many copies of the same text as that corpus does so it's

00:04:23.340 --> 00:04:29.320
<v David Flood>quite um quite quite complicated and our techniques have have grown grown because of that and perhaps

00:04:29.560 --> 00:04:37.639
<v Michael Kennedy>become more advanced than now i mean that many variations over that huge span of time over

00:04:37.660 --> 00:04:42.740
<v Michael Kennedy>different groups with different, maybe not intentions, but certainly colored by different

00:04:43.060 --> 00:04:47.620
<v Michael Kennedy>worldviews and philosophies and so on. And yeah, I see the trouble.

00:04:47.920 --> 00:04:53.940
<v David Flood>No, yeah. And they were people of the book. So copying it is something that happened a lot. And

00:04:54.160 --> 00:05:01.260
<v David Flood>they copied the monks, like the medieval monks copied everything. They copied our Greek classics.

00:05:01.900 --> 00:05:06.840
<v David Flood>So that's what I was interested in. And because of the wealth of data that we have,

00:05:07.200 --> 00:05:10.700
<v David Flood>Computer tools are more and more important in that field.

00:05:11.020 --> 00:05:17.200
<v David Flood>So when I started my PhD in 2019, I knew that I wanted to use some of these cutting-edge tools.

00:05:17.660 --> 00:05:19.260
<v David Flood>Some of them may be surprising.

00:05:19.860 --> 00:05:24.100
<v David Flood>For example, we've been using phylogenetic software.

00:05:24.480 --> 00:05:35.440
<v David Flood>This is software that evolutionary biologists are using or computational biologists are using to track, for example, how COVID strains mutate over time.

00:05:35.680 --> 00:05:36.420
<v David Flood>Oh, interesting.

00:05:36.440 --> 00:05:39.800
<v David Flood>What they're comparing are the DNA letters.

00:05:40.320 --> 00:05:43.740
<v David Flood>And so you have the sequence of letters and you're comparing how those change over time.

00:05:44.000 --> 00:05:48.160
<v David Flood>Well, you can swap in textual variants for DNA letters.

00:05:48.600 --> 00:05:55.380
<v David Flood>And now we can track how texts change over time and group them into families, things like that.

00:05:56.120 --> 00:05:59.580
<v Michael Kennedy>It's like a time series, but of words or letters or something.

00:05:59.720 --> 00:06:06.400
<v David Flood>Yeah, I mean, yeah, there's lots of important algorithms for comparing

00:06:06.400 --> 00:06:11.720
<v David Flood>sequences of things. And so if we can just swap in Greek words and Greek text instead,

00:06:12.480 --> 00:06:16.420
<v David Flood>then we can maybe apply it to textual criticism. So I was pretty interested in those things. That

00:06:16.600 --> 00:06:21.380
<v David Flood>wasn't actually the method that brought me into it, but something like that, kind of computer

00:06:21.620 --> 00:06:27.760
<v David Flood>intensive tools. What I learned is that these tools weren't actually available to me. They

00:06:27.940 --> 00:06:36.380
<v David Flood>weren't desktop applications. And for the most part, they weren't public web applications. They

00:06:36.400 --> 00:06:38.000
<v David Flood>PyPI or something like that, right?

00:06:38.420 --> 00:06:39.060
<v David Flood>Yeah, exactly.

00:06:39.360 --> 00:06:39.460
<v David Flood>Exactly.

00:06:39.540 --> 00:06:40.040
<v David Flood>Or Java.

00:06:41.180 --> 00:06:43.380
<v David Flood>And I needed to glue them together.

00:06:43.780 --> 00:06:49.660
<v David Flood>So the long story short on that is during the first year of my PhD, I was picking up Python,

00:06:50.000 --> 00:06:51.880
<v David Flood>watching YouTube videos while I was doing the dishes.

00:06:52.800 --> 00:06:57.220
<v David Flood>And then the pandemic hit while I was living in Edinburgh in Scotland, probably not far

00:06:57.460 --> 00:06:58.160
<v David Flood>from Will McCoogan.

00:06:59.220 --> 00:07:06.360
<v David Flood>And so the pandemic gave me the excuse to spend even a few more hours each day picking up these

00:07:06.380 --> 00:07:12.900
<v David Flood>new, these new technical skills. And so I did it, I was able to use these advanced tools in my in my

00:07:13.100 --> 00:07:17.440
<v David Flood>work. But what was really important to me was sharing, like making that available to my colleagues,

00:07:18.120 --> 00:07:23.860
<v David Flood>is I had to I had to move from writing these like bad top to bottom Python scripts into things that

00:07:23.860 --> 00:07:29.500
<v David Flood>could be reused by other people. And that led me into the web, because the web is where that's how

00:07:29.500 --> 00:07:35.740
<v Michael Kennedy>I can share with anybody. It's really wild how much the web is kind of the last bastion of

00:07:36.640 --> 00:07:42.480
<v Michael Kennedy>app freedom. It's so bizarre because, you know, I've many times told the stories of the insane

00:07:42.900 --> 00:07:48.500
<v Michael Kennedy>battles of just getting our apps that just playback video of content that's already on the web

00:07:48.860 --> 00:07:54.740
<v Michael Kennedy>into the app store. I mean, weeks of fighting about the weirdest, most nonsensical things with

00:07:54.860 --> 00:08:01.819
<v Michael Kennedy>both Google and Apple. But we also now have the Mac platform and the Windows platform very

00:08:01.840 --> 00:08:07.780
<v Michael Kennedy>aggressively looking for digital code certificates and all sorts of signing and other kinds of proof

00:08:07.920 --> 00:08:12.840
<v Michael Kennedy>like it you can't even just send somebody an executable anymore it won't run it's it's crazy

00:08:13.120 --> 00:08:18.940
<v David Flood>it's it's down to like okay put it on the web i guess that's right i i i played the game of

00:08:19.080 --> 00:08:24.540
<v David Flood>distributing desktop apps that's how i did it that's why i initially distributed things um

00:08:25.140 --> 00:08:30.860
<v David Flood>and at this point i just require people to install python and then install my desktop app from pypi

00:08:30.880 --> 00:08:33.400
<v David Flood>because it's too hard otherwise for me.

00:08:33.820 --> 00:08:36.479
<v David Flood>I mean, I could pay for the code signing from Apple

00:08:36.890 --> 00:08:37.599
<v David Flood>and do all of that,

00:08:37.740 --> 00:08:40.320
<v David Flood>but it's just, it's too much work for the time that I have.

00:08:40.500 --> 00:08:42.140
<v Michael Kennedy>Yeah, I'm about to do another round of it.

00:08:42.200 --> 00:08:42.979
<v Michael Kennedy>I'm working on an app

00:08:44.060 --> 00:08:45.680
<v Michael Kennedy>and my developer account is still active.

00:08:45.880 --> 00:08:47.680
<v Michael Kennedy>So we might have a fresh round of fun.

00:08:47.820 --> 00:08:49.260
<v Michael Kennedy>Hopefully it goes through this time.

00:08:50.320 --> 00:08:52.160
<v Michael Kennedy>Anyway, I do think it's such a challenge.

00:08:52.380 --> 00:08:53.520
<v Michael Kennedy>And are you leveraging?

00:08:53.940 --> 00:08:55.180
<v Michael Kennedy>I don't know if the timing was right.

00:08:55.300 --> 00:08:56.300
<v Michael Kennedy>Like maybe this was too early,

00:08:56.780 --> 00:08:59.740
<v Michael Kennedy>but these days, are you leveraging things like uvx

00:09:00.060 --> 00:09:03.640
<v Michael Kennedy>to run, or are you just pip install this thing and then run it?

00:09:04.100 --> 00:09:08.200
<v David Flood>Yeah, I haven't updated the readme in a while, so I think it just asks for pip.

00:09:08.740 --> 00:09:14.220
<v David Flood>But certainly, if somebody asked me today, I would say, yeah, just install this with uv.

00:09:14.920 --> 00:09:16.260
<v David Flood>Because then they don't even need Python.

00:09:16.700 --> 00:09:17.100
<v David Flood>Exactly.

00:09:17.420 --> 00:09:17.860
<v Michael Kennedy>And that's brilliant.

00:09:18.440 --> 00:09:22.900
<v Michael Kennedy>And that's a really, it is another barrier reduced in distributing these applications,

00:09:23.160 --> 00:09:23.220
<v Michael Kennedy>right?

00:09:23.300 --> 00:09:28.600
<v Michael Kennedy>Like, if you can get uv installed on a machine, then you don't even have to say install, just

00:09:28.560 --> 00:09:32.960
<v Michael Kennedy>The way you run it is uvx my thing and it's all transparent to you, right?

00:09:33.020 --> 00:09:33.520
<v Michael Kennedy>Which is beautiful.

00:09:33.900 --> 00:09:34.880
<v Michael Kennedy>So what was it like?

00:09:35.100 --> 00:09:35.320
<v Michael Kennedy>Yeah.

00:09:35.790 --> 00:09:42.340
<v Michael Kennedy>So what was it like coming from what sounds like a not super screen focus, super

00:09:43.020 --> 00:09:47.300
<v Michael Kennedy>techie aspect and having to dive into this world and someday you're probably

00:09:47.420 --> 00:09:49.900
<v Michael Kennedy>like, how is it that I'm publishing stuff to PyPI?

00:09:49.990 --> 00:09:50.740
<v Michael Kennedy>What has happened to me?

00:09:51.300 --> 00:09:51.720
<v David Flood>Yeah.

00:09:51.970 --> 00:09:56.259
<v David Flood>well, yeah, I remember when I, when I first signed up for GitHub, because

00:09:56.320 --> 00:10:02.160
<v David Flood>you know, whatever YouTube tutorial I was working through at the time, you know, said that I needed

00:10:02.160 --> 00:10:08.720
<v David Flood>to do that. You know, I think it all started making a lot of sense. I didn't have any technical

00:10:08.980 --> 00:10:16.880
<v David Flood>background, but the world kind of open source software, it just kind of made sense. It felt

00:10:17.020 --> 00:10:23.820
<v David Flood>like it fit really well into my academic, you know, circle. I think a lot of the attitudes are

00:10:23.840 --> 00:10:27.980
<v Michael Kennedy>similar. I agree. I think they are actually. And I think that's, I think that's a pretty neat thing.

00:10:28.480 --> 00:10:34.240
<v Michael Kennedy>Yeah. Very cool. All right. Well, let's talk about what you're doing with digital humanities.

00:10:34.760 --> 00:10:40.400
<v Michael Kennedy>You're actually at a really interesting project or organization, I guess, that does many projects,

00:10:40.600 --> 00:10:45.020
<v David Flood>right? Yeah. Yeah. So fast, fast forwarding, I did, I finished my PhD in the humanities.

00:10:45.200 --> 00:10:50.780
<v David Flood>Sorry. I had so much fun. No, that's fine. That's fine. I had so much fun writing like these tools

00:10:50.860 --> 00:10:54.120
<v David Flood>and then just solving the distribution problem

00:10:54.450 --> 00:10:55.760
<v David Flood>to share them with other scholars.

00:10:56.740 --> 00:11:00.120
<v David Flood>That was so fun that I was open to this kind of opportunity

00:11:00.770 --> 00:11:01.980
<v David Flood>where now I'm doing this full time.

00:11:02.570 --> 00:11:04.200
<v David Flood>And so, yes, so I'm on the,

00:11:04.500 --> 00:11:06.720
<v David Flood>we call it affectionately Darth,

00:11:07.440 --> 00:11:10.240
<v David Flood>which is digital arts and humanities at Harvard.

00:11:11.200 --> 00:11:14.160
<v Michael Kennedy>There has to be a lot of Star Wars memes and references,

00:11:14.310 --> 00:11:14.680
<v Michael Kennedy>I'm sure.

00:11:14.980 --> 00:11:16.400
<v David Flood>If you can pull up a 404,

00:11:16.770 --> 00:11:19.060
<v David Flood>I think there will be a Darth Vader reference.

00:11:19.470 --> 00:11:20.660
<v Michael Kennedy>Seriously, I'm here for it.

00:11:22.360 --> 00:11:25.680
<v Michael Kennedy>Yes, page not found. I find your lack of nav disturbing.

00:11:27.660 --> 00:11:33.020
<v Michael Kennedy>You know what? I think that is beautiful. And I really, I really think that people should embrace

00:11:33.580 --> 00:11:40.860
<v Michael Kennedy>the 404, the fun 404 page, you know, more, right? There should really be something going on that

00:11:40.900 --> 00:11:45.560
<v Michael Kennedy>like makes it, you know, something hasn't worked out, but you can just, you can make people laugh.

00:11:46.280 --> 00:11:47.340
<v Michael Kennedy>Yeah. I appreciate that.

00:11:48.560 --> 00:11:50.360
<v David Flood>I've heard people push back against it.

00:11:50.490 --> 00:11:57.900
<v David Flood>Like if you're on a, if you're on like your medical website and you're maybe about to get bad news and then you get like a picture of a kitten.

00:12:00.160 --> 00:12:01.880
<v Michael Kennedy>Dr. Kitten doesn't know where your results went.

00:12:02.020 --> 00:12:02.760
<v Michael Kennedy>Like I get that.

00:12:02.800 --> 00:12:03.300
<v Michael Kennedy>That's not funny.

00:12:04.060 --> 00:12:05.780
<v Michael Kennedy>But I mean, most things are not that serious.

00:12:06.560 --> 00:12:06.780
<v David Flood>Yeah.

00:12:07.540 --> 00:12:07.920
<v Michael Kennedy>Mostly.

00:12:08.780 --> 00:12:08.920
<v Michael Kennedy>Okay.

00:12:09.180 --> 00:12:11.740
<v Michael Kennedy>So what kind of things does Darth do?

00:12:12.080 --> 00:12:16.680
<v Michael Kennedy>You've described this as kind of a web or tech agency within Harvard.

00:12:17.280 --> 00:12:18.200
<v Michael Kennedy>Yeah, it is very much.

00:12:18.420 --> 00:12:21.740
<v David Flood>So, you know, Harvard has a gigantic IT group.

00:12:21.890 --> 00:12:28.080
<v David Flood>I don't know how many hundreds of people work, but more than 500 people in IT.

00:12:28.840 --> 00:12:33.000
<v David Flood>We are a small team and we operate very much like a small agency.

00:12:33.540 --> 00:12:41.500
<v David Flood>So usually what happens is a faculty member has a funded research project that's going to last for an amount of time.

00:12:42.210 --> 00:12:44.640
<v David Flood>And then we consult with them to build it.

00:12:44.880 --> 00:12:53.160
<v David Flood>And most of the time, I kind of think of these as I kind of have these different categories of these kinds of projects that I think of.

00:12:54.070 --> 00:12:56.060
<v David Flood>I lost in my notes what I call them.

00:12:56.170 --> 00:12:57.240
<v David Flood>But they are there.

00:12:57.390 --> 00:13:01.160
<v David Flood>You have like a one is like a virtual research environment.

00:13:01.450 --> 00:13:07.400
<v David Flood>So the focus is this is this is a platform that we're building for the research to be done on.

00:13:07.720 --> 00:13:17.000
<v David Flood>Like the reason the research should be done in like a web app would be because you have access to visualization, to Postgres, to Pandas.

00:13:17.170 --> 00:13:23.420
<v David Flood>So we can kind of build up this platform to do the actual research on and some of the data entry.

00:13:23.700 --> 00:13:26.060
<v Michael Kennedy>So like a full on research application.

00:13:26.660 --> 00:13:27.040
<v Michael Kennedy>Yeah, exactly.

00:13:27.580 --> 00:13:36.040
<v Michael Kennedy>I guess you can also kind of see your work through the different stages of research projects and academic research and so on.

00:13:36.220 --> 00:13:41.900
<v Michael Kennedy>And we'll get to maybe end of life in a sense further down in the conversation.

00:13:42.470 --> 00:13:48.680
<v Michael Kennedy>But so this would be we have a grant or we just work here and we're going to work on some form of research.

00:13:49.210 --> 00:13:49.960
<v Michael Kennedy>What do you give them?

00:13:50.480 --> 00:13:58.540
<v Michael Kennedy>Right. And I think that's a super interesting challenge because one of the real common answers would be Jupyter, Jupyter Lab, Marimo, whatever.

00:13:59.130 --> 00:14:05.380
<v Michael Kennedy>But that's still pretty code heavy for people who are possibly philosophers or something, you know.

00:14:05.800 --> 00:14:12.820
<v David Flood>Oh, exactly. That's why in digital humanities, I won't even, maybe I won't even attempt to define

00:14:13.710 --> 00:14:19.820
<v David Flood>it in any narrow sense, because I'll get in trouble with somebody. But you have two groups

00:14:20.370 --> 00:14:26.580
<v David Flood>that are interfacing with each other. And one is digital humanities as a field, like as a subfield,

00:14:26.800 --> 00:14:31.320
<v David Flood>all of its own. And these are people who have humanities domain, like knowledge,

00:14:31.860 --> 00:14:36.680
<v David Flood>and technical skills, and they're bringing them together. And in a lot of cases, the audience for

00:14:36.840 --> 00:14:42.480
<v David Flood>that kind of work is other people working in the digital humanities. But far more common,

00:14:42.780 --> 00:14:49.220
<v David Flood>and this is what we work with, is people who have humanities domain expertise, and they want to

00:14:49.600 --> 00:14:55.560
<v David Flood>publish or do research or share with other people who have that same humanities domain expertise,

00:14:55.640 --> 00:14:59.640
<v David Flood>and they are now interested in adding a technical component to it.

00:14:59.960 --> 00:15:02.000
<v David Flood>How can we supercharge what they have?

00:15:03.500 --> 00:15:06.120
<v Michael Kennedy>This portion of Talk Python is brought to you by Sentry.

00:15:06.580 --> 00:15:09.680
<v Michael Kennedy>I've been using Sentry personally on almost every application

00:15:10.060 --> 00:15:12.480
<v Michael Kennedy>and API that I've built for Talk Python and beyond

00:15:13.260 --> 00:15:14.260
<v Michael Kennedy>over the last few years.

00:15:14.580 --> 00:15:17.460
<v Michael Kennedy>They're a core building block for keeping my infrastructure solid.

00:15:18.060 --> 00:15:19.360
<v Michael Kennedy>They should be for yours as well.

00:15:19.640 --> 00:15:20.020
<v Michael Kennedy>Here's why.

00:15:20.680 --> 00:15:22.100
<v Michael Kennedy>Sentry doesn't just catch errors.

00:15:22.200 --> 00:15:24.900
<v Michael Kennedy>It catches all the stuff that makes your app feel broken,

00:15:25.280 --> 00:15:27.560
<v Michael Kennedy>the random slowdown, the freeze you can't reproduce,

00:15:28.260 --> 00:15:30.620
<v Michael Kennedy>that bug that only shows up once real users hit it.

00:15:30.960 --> 00:15:31.820
<v Michael Kennedy>And when something goes wrong,

00:15:32.180 --> 00:15:34.500
<v Michael Kennedy>Sentry gives you the whole chain of events in one place.

00:15:34.720 --> 00:15:37.700
<v Michael Kennedy>Errors, traces, replays, logs, dots connected.

00:15:38.080 --> 00:15:39.900
<v Michael Kennedy>You can see what's led to the issue

00:15:40.040 --> 00:15:41.880
<v Michael Kennedy>without digging through five different dashboards.

00:15:42.700 --> 00:15:44.720
<v Michael Kennedy>SEER, Sentry's AI debugging agent,

00:15:45.200 --> 00:15:47.180
<v Michael Kennedy>builds on this data, taking the full context,

00:15:47.840 --> 00:15:49.820
<v Michael Kennedy>explaining why the issue happened,

00:15:50.400 --> 00:15:52.780
<v Michael Kennedy>pointing to the code responsible, drafts a fix,

00:15:52.880 --> 00:15:55.840
<v Michael Kennedy>and even flags if your PR is about to introduce a new problem.

00:15:56.680 --> 00:15:57.720
<v Michael Kennedy>The workflow stays simple.

00:15:58.160 --> 00:15:59.900
<v Michael Kennedy>Something breaks, Sentry alerts you,

00:16:00.080 --> 00:16:01.880
<v Michael Kennedy>the dashboard shows you the full context,

00:16:02.220 --> 00:16:05.360
<v Michael Kennedy>Seer helps you fix it and catch new issues before they ship.

00:16:06.080 --> 00:16:08.920
<v Michael Kennedy>It's totally reasonable to go from an error occurred

00:16:09.080 --> 00:16:11.140
<v Michael Kennedy>to fixed in production in just 10 minutes.

00:16:12.200 --> 00:16:14.960
<v Michael Kennedy>I truly appreciate the support that Sentry has given me

00:16:15.060 --> 00:16:17.520
<v Michael Kennedy>to help solve my bugs and issues in my apps,

00:16:18.160 --> 00:16:20.740
<v Michael Kennedy>especially those tricky ones that only appear in production.

00:16:21.100 --> 00:16:22.580
<v Michael Kennedy>I know you will too if you try them out.

00:16:22.880 --> 00:16:24.520
<v Michael Kennedy>So get started today with Sentry.

00:16:24.700 --> 00:16:29.720
<v Michael Kennedy>Just visit talkpython.fm/sentry and get $100 in Sentry credits.

00:16:30.240 --> 00:16:30.960
<v Michael Kennedy>Please use that link.

00:16:31.060 --> 00:16:32.320
<v Michael Kennedy>It's in your podcast player show notes.

00:16:32.420 --> 00:16:37.760
<v Michael Kennedy>If you're signing up some other way, you can use our code talkpython26, all one word,

00:16:38.340 --> 00:16:40.900
<v Michael Kennedy>talkpython26, to get $100 in credits.

00:16:41.680 --> 00:16:43.360
<v Michael Kennedy>Thank you to Sentry for supporting the show.

00:16:44.500 --> 00:16:49.319
<v Michael Kennedy>Maybe just take a moment and speak to, maybe, I don't know if this venue will actually speak

00:16:49.340 --> 00:16:54.200
<v Michael Kennedy>directly to anybody who I was imagining here, but people who work with folks, what would you tell

00:16:54.340 --> 00:16:58.720
<v Michael Kennedy>somebody who works with a group who have some technical skill, who could create some of these

00:16:58.880 --> 00:17:02.280
<v Michael Kennedy>things that we're going to talk about, but the people who they've created for don't necessarily

00:17:02.540 --> 00:17:09.420
<v Michael Kennedy>think they need it or know that they need it. I've gone often on rants about how programming is a

00:17:09.680 --> 00:17:15.260
<v Michael Kennedy>superpower, not a replacement for your job, right? Yeah. That's a problem for a lot of people,

00:17:15.360 --> 00:17:20.500
<v David Flood>especially because you might use some new computer tools to supercharge your research.

00:17:20.980 --> 00:17:25.740
<v David Flood>But the article that you publish or the research output of that, the audience, they may not

00:17:25.860 --> 00:17:27.660
<v David Flood>be interested in hearing about that at all.

00:17:28.040 --> 00:17:32.760
<v David Flood>And so for most people who are working in this space, the tools, you have to use them

00:17:33.000 --> 00:17:37.720
<v David Flood>in such a way that you can talk about the research output without talking about the

00:17:37.860 --> 00:17:38.000
<v David Flood>tool.

00:17:38.260 --> 00:17:42.979
<v David Flood>And we have other venues to talk about the tools themselves, like the Journal for Open

00:17:43.000 --> 00:17:48.660
<v David Flood>source software and you can kind of get some of it out there. But that is a, that's the significant

00:17:48.880 --> 00:17:53.020
<v David Flood>challenge is convincing people that it, that it could be useful and then convincing the audience

00:17:53.230 --> 00:17:57.800
<v David Flood>that they should be interested in kind of the methods behind how some of the new research comes

00:17:57.800 --> 00:18:02.940
<v Michael Kennedy>up. Also, I think I'm a big believer that presenting stuff in the right order is really,

00:18:03.130 --> 00:18:07.620
<v Michael Kennedy>really important. If you present your research and it's beautiful and powerful and oh, look,

00:18:07.760 --> 00:18:12.500
<v Michael Kennedy>we've also, by the way, covered a hundred times more data than any prior research. Surprise,

00:18:12.760 --> 00:18:13.520
<v Michael Kennedy>I wonder how I did that.

00:18:14.160 --> 00:18:15.400
<v Michael Kennedy>And then people are like, this is amazing.

00:18:16.580 --> 00:18:19.580
<v Michael Kennedy>Then after you kind of hook them with the inspiration and what's possible,

00:18:19.680 --> 00:18:21.480
<v Michael Kennedy>then you're like, let me tell you about the tool.

00:18:21.600 --> 00:18:22.860
<v Michael Kennedy>And all of a sudden you're like, that's a cool tool, right?

00:18:22.920 --> 00:18:26.100
<v Michael Kennedy>This is not just like geekery, like programmer, you know,

00:18:26.440 --> 00:18:28.120
<v Michael Kennedy>Charlie Brown speak, wah, wah, wah, wah, wah.

00:18:28.300 --> 00:18:29.660
<v Michael Kennedy>You know, it's like, no, I'm listening.

00:18:29.880 --> 00:18:30.460
<v Michael Kennedy>Tell me now.

00:18:30.820 --> 00:18:31.260
<v David Flood>Yeah, exactly.

00:18:31.620 --> 00:18:34.960
<v David Flood>I mean, one of the things I think that really opens people's eyes

00:18:35.300 --> 00:18:37.720
<v David Flood>is a really powerful search interface.

00:18:38.260 --> 00:18:39.740
<v David Flood>You have all of this research data.

00:18:40.120 --> 00:18:45.020
<v David Flood>just put it behind Elasticsearch with some really good filtering on it. And all of a sudden you have

00:18:45.180 --> 00:18:50.740
<v David Flood>fast, rapid access to the data in a way you never had before. Like you were never scrolling through

00:18:51.140 --> 00:18:55.160
<v David Flood>the Excel spreadsheets and finding exactly what you wanted, like you were with this new search

00:18:55.400 --> 00:19:00.360
<v David Flood>interface. And that by itself is like so simple. We're so used to that in web development that

00:19:00.480 --> 00:19:05.099
<v David Flood>like everything needs to have a fantastic search now. But so many people have their data locked

00:19:05.120 --> 00:19:07.500
<v David Flood>behind, you know, a terrible search interface.

00:19:07.960 --> 00:19:10.400
<v Michael Kennedy>Yeah, just a few things to sort of expose that.

00:19:10.500 --> 00:19:14.880
<v Michael Kennedy>So this, give us a sense of what these data exploration web apps might look like.

00:19:14.940 --> 00:19:20.060
<v Michael Kennedy>These are probably kind of mostly stuck to the inside, kind of internal to the research

00:19:20.540 --> 00:19:22.820
<v Michael Kennedy>lab research team groups and so on.

00:19:22.960 --> 00:19:24.720
<v Michael Kennedy>These are probably not that public facing, right?

00:19:24.980 --> 00:19:28.580
<v David Flood>Almost everything we work on does end up having a public facing component.

00:19:28.940 --> 00:19:33.960
<v David Flood>So maybe the research itself is done, locked behind a user login.

00:19:34.300 --> 00:19:35.440
<v David Flood>That's just for the researchers.

00:19:36.290 --> 00:19:38.880
<v David Flood>But then they expose that research to the public,

00:19:39.520 --> 00:19:41.080
<v David Flood>usually with a good search interface

00:19:41.640 --> 00:19:44.840
<v David Flood>and different pages for exploring their data

00:19:45.020 --> 00:19:47.200
<v David Flood>and visualizations and things like that.

00:19:47.380 --> 00:19:49.360
<v David Flood>So yeah, everything we do ends up becoming

00:19:49.850 --> 00:19:52.560
<v David Flood>a production public web app in the end.

00:19:52.760 --> 00:19:54.740
<v Michael Kennedy>And then another one of your categories,

00:19:54.830 --> 00:19:57.000
<v Michael Kennedy>you put it was virtual research environments

00:19:57.260 --> 00:19:59.740
<v Michael Kennedy>like data entry, publishing, authoring, collaboration.

00:20:00.050 --> 00:20:00.540
<v Michael Kennedy>Tell us about that.

00:20:01.280 --> 00:20:03.139
<v David Flood>Yeah, so a good example of this maybe

00:20:03.160 --> 00:20:08.820
<v David Flood>is one of the projects that... Well, actually, the best example of it is the project I worked on

00:20:08.930 --> 00:20:16.380
<v David Flood>during my PhD. It's called Apatosaurus. The short story behind the name is that it sounds like

00:20:16.540 --> 00:20:24.280
<v David Flood>apparatus. In textual criticism, when you are displaying and visualizing variant readings to

00:20:24.710 --> 00:20:31.959
<v David Flood>a base text, that form of visualizing it is a critical apparatus. A critical apparatus is a

00:20:32.040 --> 00:20:37.500
<v David Flood>a pretty boring website name, but Apatosaurus dinosaurs might make textual criticism sound fun.

00:20:37.720 --> 00:20:43.180
<v Michael Kennedy>Yeah, I do love dinosaurs. No, that's really cool. So this, this comes out as a web app. And I know

00:20:43.180 --> 00:20:46.160
<v Michael Kennedy>you also have some, you talked about some desktop apps as well.

00:20:46.640 --> 00:20:50.460
<v David Flood>Yep. Yep. That's right. So, yeah. So, so there's this people, people upload their,

00:20:50.550 --> 00:20:54.960
<v David Flood>their collation to this and then they can visualize it. And like there, there's a public

00:20:55.440 --> 00:21:00.320
<v David Flood>component of this as well, but really the backend is editing, editing a collation,

00:21:00.500 --> 00:21:03.160
<v David Flood>and adding notes to all of the different readings and stuff.

00:21:03.560 --> 00:21:07.060
<v David Flood>So I could show what the backend looks like,

00:21:07.280 --> 00:21:08.200
<v David Flood>but we can also move on.

00:21:08.440 --> 00:21:11.420
<v Michael Kennedy>- Let's move on just because most people

00:21:11.620 --> 00:21:14.540
<v Michael Kennedy>will not totally hear, but just give us a sense of like,

00:21:14.880 --> 00:21:18.740
<v Michael Kennedy>like what do people, what do you create for people

00:21:18.900 --> 00:21:21.660
<v Michael Kennedy>so that they're like, yeah, I can use this app, right?

00:21:21.760 --> 00:21:23.440
<v Michael Kennedy>Like give us a sense of some of the features,

00:21:23.660 --> 00:21:24.960
<v Michael Kennedy>I guess is what I'm getting to.

00:21:25.260 --> 00:21:29.319
<v David Flood>- Yeah, so another good example is we have a project

00:21:29.360 --> 00:21:32.380
<v David Flood>at Harvard called Mapping Color in History.

00:21:33.150 --> 00:21:36.980
<v David Flood>And this is a collaboration with a lab.

00:21:37.150 --> 00:21:38.900
<v David Flood>This lab brings in pieces of artwork

00:21:39.480 --> 00:21:42.560
<v David Flood>and they do spectral analysis on the pigments

00:21:42.590 --> 00:21:45.160
<v David Flood>so they can identify what was used

00:21:45.160 --> 00:21:48.360
<v David Flood>to make a particular color of this red

00:21:48.550 --> 00:21:50.860
<v David Flood>or what was made to make this color of blue.

00:21:51.420 --> 00:21:53.640
<v David Flood>And then the idea is tracking

00:21:54.000 --> 00:21:56.880
<v David Flood>how did people make those pigments over time,

00:21:57.280 --> 00:22:01.720
<v David Flood>over time and specifically in Asian art.

00:22:02.260 --> 00:22:04.380
<v Michael Kennedy>Is this the Dharmra, Puna, Puna?

00:22:05.440 --> 00:22:07.960
<v David Flood>No, this is mapping color in history.

00:22:08.030 --> 00:22:09.640
<v David Flood>I don't think it's up here.

00:22:09.770 --> 00:22:10.280
<v David Flood>Sorry about that.

00:22:10.420 --> 00:22:10.680
<v Michael Kennedy>Somewhere.

00:22:10.940 --> 00:22:11.300
<v Michael Kennedy>That's all right.

00:22:11.330 --> 00:22:11.840
<v Michael Kennedy>I'll find it.

00:22:12.030 --> 00:22:12.440
<v Michael Kennedy>Keep talking.

00:22:13.680 --> 00:22:13.840
<v David Flood>Okay.

00:22:14.050 --> 00:22:16.300
<v David Flood>So the front end is great.

00:22:16.510 --> 00:22:18.040
<v David Flood>You know, like the public end,

00:22:18.180 --> 00:22:21.000
<v David Flood>this is people can explore by pigments

00:22:21.280 --> 00:22:24.440
<v David Flood>and then see the images that contain those pigments.

00:22:24.560 --> 00:22:30.680
<v David Flood>Now in the back end, what the researchers will be able to do is correlate exactly which

00:22:30.960 --> 00:22:34.260
<v David Flood>point of a painting the analysis was done on.

00:22:34.490 --> 00:22:38.640
<v David Flood>So they have this deep zoom image viewer where they'll zoom in and they'll select the point

00:22:39.390 --> 00:22:40.280
<v David Flood>where that was taken from.

00:22:41.090 --> 00:22:47.640
<v David Flood>So how else would you do that other than a digital interface to indicate on an image of

00:22:47.950 --> 00:22:52.060
<v David Flood>a painting where that spectral analysis was performed?

00:22:52.380 --> 00:22:55.020
<v Michael Kennedy>Sounds almost like astronomy in a weird way.

00:22:55.050 --> 00:22:55.300
<v Michael Kennedy>Oh, yeah.

00:22:55.840 --> 00:23:04.580
<v Michael Kennedy>We zoomed into here and we took a different spectrum of the painting and we realized that it's actually identical to this, you know, something crazy like that, right?

00:23:04.900 --> 00:23:06.140
<v David Flood>Yeah, yeah, yeah, that's right.

00:23:06.200 --> 00:23:08.560
<v David Flood>Yeah, so it's essentially a pigments, like a pigments database.

00:23:10.100 --> 00:23:17.940
<v Michael Kennedy>So the third category of these digital humanities projects that you put down was like data extraction, transformation.

00:23:19.260 --> 00:23:29.540
<v Michael Kennedy>In data science, they often say, you know, 80% of the work is the data wrangling, which is like cleaning, organization, just getting it so you could possibly start asking questions about it.

00:23:29.820 --> 00:23:30.960
<v Michael Kennedy>I'm sure you all do a lot of that.

00:23:31.180 --> 00:23:31.440
<v David Flood>Absolutely.

00:23:32.560 --> 00:23:40.880
<v David Flood>So often, the very beginning of a project might be an Excel sheet or several spreadsheets.

00:23:41.800 --> 00:23:46.020
<v David Flood>And the first task is to ingest these into, you know, a proper database.

00:23:46.640 --> 00:23:48.600
<v David Flood>Not so much MongoDB for us.

00:23:48.760 --> 00:23:49.840
<v David Flood>It's going into Postgres.

00:23:50.340 --> 00:23:51.420
<v David Flood>We're Django Shop.

00:23:51.680 --> 00:23:52.480
<v David Flood>We're Django Shop.

00:23:52.630 --> 00:23:53.760
<v David Flood>So it's going into Postgres.

00:23:55.090 --> 00:24:06.560
<v David Flood>And yeah, no, that is probably the number one challenge of the early stage is figuring out what the right data model is, what the right relationships are to model the data.

00:24:07.300 --> 00:24:17.020
<v David Flood>Doing that work is advantageous to everybody because, you know, it helps both the researchers who brought the data to think about it in a more organized way.

00:24:17.410 --> 00:24:18.540
<v David Flood>I mean, they've been trying to do that.

00:24:18.720 --> 00:24:19.680
<v David Flood>And they have the spreadsheets.

00:24:20.160 --> 00:24:27.800
<v David Flood>But now we're modeling out the data so that we can add it to database tables and then to use later.

00:24:27.880 --> 00:24:29.340
<v David Flood>So that works out well for everybody.

00:24:30.000 --> 00:24:30.720
<v David Flood>And yeah, absolutely.

00:24:31.100 --> 00:24:45.760
<v David Flood>Cleaning the data, getting dates, working with fuzzy dates, being able to parse July of 2020 or summer of 2020 and handling kind of all of those cases so that we do get dates in the end.

00:24:45.780 --> 00:24:55.980
<v Michael Kennedy>One of the crazy stories from data parsing history is one of the, I can't remember exactly what it was, you talked about biology tools or genetics tools earlier.

00:24:56.100 --> 00:25:03.780
<v Michael Kennedy>One of the groups that names genes had to change the name of a gene because it kept getting parsed by Excel into a date.

00:25:04.880 --> 00:25:05.520
<v Michael Kennedy>Yeah, I remember that.

00:25:05.590 --> 00:25:06.200
<v Michael Kennedy>I remember that.

00:25:06.320 --> 00:25:06.600
<v Michael Kennedy>That's right.

00:25:07.260 --> 00:25:07.580
<v Michael Kennedy>Yes.

00:25:08.100 --> 00:25:10.580
<v Michael Kennedy>So these are the weird edge cases I'm sure you run into.

00:25:11.940 --> 00:25:13.120
<v Michael Kennedy>Like it's not even supposed to be a date.

00:25:13.220 --> 00:25:13.940
<v Michael Kennedy>Why is this a date?

00:25:13.990 --> 00:25:14.780
<v Michael Kennedy>I don't know.

00:25:14.940 --> 00:25:16.240
<v Michael Kennedy>Why is it helping out here?

00:25:16.920 --> 00:25:17.820
<v Michael Kennedy>The code keeps crashing.

00:25:18.000 --> 00:25:20.480
<v Michael Kennedy>Like pandas parsed it as a date and it's not or whatever.

00:25:21.220 --> 00:25:21.540
<v David Flood>Absolutely.

00:25:21.980 --> 00:25:22.060
<v David Flood>Yeah.

00:25:22.140 --> 00:25:22.300
<v David Flood>Yeah.

00:25:22.340 --> 00:25:27.320
<v David Flood>So yeah, usually lots of test suites around that ingest process until we've got it.

00:25:27.640 --> 00:25:32.220
<v David Flood>Now, once we've got it in, usually the research is ongoing and then we're able to provide

00:25:32.420 --> 00:25:38.140
<v David Flood>them now a new cleaned interface to do the additional data entry as the project is going.

00:25:38.420 --> 00:25:39.780
<v David Flood>And that's usually a win-win for everybody.

00:25:40.180 --> 00:25:40.340
<v Michael Kennedy>Sure.

00:25:40.620 --> 00:25:45.780
<v Michael Kennedy>And so this sort of ETL ingestion side of everything is it's like, don't worry,

00:25:46.420 --> 00:25:47.460
<v Michael Kennedy>Darth has got it for you.

00:25:47.760 --> 00:25:51.180
<v Michael Kennedy>And then we'll provide you like a database connection to start working.

00:25:51.480 --> 00:25:54.700
<v Michael Kennedy>Or do you give them the tools and then they kind of iterate on them?

00:25:54.940 --> 00:26:00.220
<v Michael Kennedy>And how much is this you and how much is this you providing like CLI tools and stuff

00:26:00.460 --> 00:26:01.540
<v Michael Kennedy>or notebooks over to people?

00:26:03.560 --> 00:26:08.520
<v David Flood>I'd say most of the people that we're working with are aware of the technical tools,

00:26:08.640 --> 00:26:10.380
<v David Flood>but they don't want a database connection.

00:26:10.800 --> 00:26:16.520
<v David Flood>So we are giving them, we're doing the ingest and then building a platform where they can begin interacting with their data.

00:26:17.240 --> 00:26:18.720
<v Michael Kennedy>Yeah, I'm sure they don't want one.

00:26:20.140 --> 00:26:22.600
<v Michael Kennedy>Maybe you give them an app though, right?

00:26:22.820 --> 00:26:24.940
<v Michael Kennedy>With like Elasticsearch and other things that they can.

00:26:25.120 --> 00:26:25.400
<v David Flood>No, absolutely.

00:26:25.680 --> 00:26:26.400
<v David Flood>Yeah, that's what we do.

00:26:26.720 --> 00:26:27.040
<v David Flood>Yeah, okay.

00:26:27.140 --> 00:26:32.520
<v David Flood>Yeah, we give them a web platform to begin exploring, to begin publishing.

00:26:34.320 --> 00:26:38.760
<v Michael Kennedy>So I was thinking that you said you're a Django shop, which is cool.

00:26:38.840 --> 00:26:43.280
<v Michael Kennedy>It sounds, though, to me like describing what you're doing, just imagining how this is.

00:26:43.640 --> 00:26:46.000
<v Michael Kennedy>You're probably creating these projects often.

00:26:46.520 --> 00:26:49.440
<v Michael Kennedy>How often does one of these projects actually last?

00:26:49.980 --> 00:26:51.900
<v Michael Kennedy>Or how many of them do you iterate?

00:26:53.180 --> 00:26:53.700
<v Michael Kennedy>I'm trying to get a sense.

00:26:53.920 --> 00:26:56.960
<v Michael Kennedy>Do you work on stuff for a year or is it like every two weeks we're on a new project?

00:26:58.180 --> 00:26:59.980
<v David Flood>It's why I think of us as like an agency.

00:27:00.900 --> 00:27:04.240
<v David Flood>Because we get to work on greenfield projects fairly often, like you're imagining.

00:27:04.700 --> 00:27:08.880
<v David Flood>Which would not be the case normally at a big university IT department.

00:27:09.960 --> 00:27:15.040
<v David Flood>So, you know, maybe two or three projects a year, two or three big ones a year.

00:27:15.500 --> 00:27:18.640
<v David Flood>And then we have to put to bed a few a year as well.

00:27:18.740 --> 00:27:21.260
<v David Flood>Because these things, they're funded with grant money.

00:27:21.600 --> 00:27:24.060
<v David Flood>And then the grant money runs out and it's time.

00:27:24.200 --> 00:27:26.320
<v David Flood>And then we have to figure out what do we do with it now?

00:27:26.380 --> 00:27:31.060
<v David Flood>We don't want to lose the data and this way of presenting it.

00:27:31.100 --> 00:27:33.140
<v David Flood>But we can't keep paying for Elasticsearch.

00:27:33.520 --> 00:27:34.120
<v Michael Kennedy>Yeah, of course.

00:27:34.380 --> 00:27:37.620
<v Michael Kennedy>I'm certainly, we're going to dive into that because that is, but let's save that for the

00:27:37.740 --> 00:27:37.780
<v Michael Kennedy>end.

00:27:37.800 --> 00:27:40.920
<v Michael Kennedy>It seems like that's the arc of the story of these things.

00:27:40.960 --> 00:27:44.700
<v Michael Kennedy>But I certainly think it's something that you don't think about that much, right?

00:27:44.980 --> 00:27:47.360
<v Michael Kennedy>Like you said, it was only a hundred dollars a month for this.

00:27:47.440 --> 00:27:48.280
<v Michael Kennedy>And we got a big grant.

00:27:48.400 --> 00:27:49.200
<v Michael Kennedy>There's a bunch of, no big deal.

00:27:49.280 --> 00:27:52.880
<v Michael Kennedy>But like when the grant's out, who's on the hook for a hundred dollars a month and making

00:27:53.040 --> 00:27:55.920
<v Michael Kennedy>sure it survives upgrades and all that kind of business.

00:27:56.400 --> 00:27:56.780
<v David Flood>No, that's right.

00:27:57.080 --> 00:27:57.240
<v Michael Kennedy>Yeah.

00:27:57.360 --> 00:28:02.720
<v Michael Kennedy>So my original question when I started on this path was thinking like, do you, how do you

00:28:02.780 --> 00:28:03.400
<v Michael Kennedy>get started on these?

00:28:03.460 --> 00:28:07.640
<v Michael Kennedy>Do you have like a big framework or a cookie cutter sort of thing or something like this

00:28:07.760 --> 00:28:11.740
<v Michael Kennedy>is how we do it because it plugs into all this other automation and tools we built for

00:28:11.800 --> 00:28:12.680
<v Michael Kennedy>the last 10 projects.

00:28:13.220 --> 00:28:14.400
<v Michael Kennedy>You know, that's kind of a unique position.

00:28:14.920 --> 00:28:19.020
<v Michael Kennedy>A lot of companies build one website for themselves and that's their app or they're

00:28:19.020 --> 00:28:21.480
<v Michael Kennedy>an agency that goes across so many, so much variation.

00:28:21.660 --> 00:28:22.600
<v Michael Kennedy>They can't do that kind of stuff.

00:28:22.680 --> 00:28:22.820
<v Michael Kennedy>Right.

00:28:23.220 --> 00:28:23.820
<v David Flood>That's right.

00:28:24.080 --> 00:28:24.360
<v David Flood>That's right.

00:28:25.300 --> 00:28:25.920
<v David Flood>That's a good question.

00:28:26.320 --> 00:28:28.840
<v David Flood>We have things that we reuse.

00:28:29.000 --> 00:28:35.400
<v David Flood>Some of them are open source, different search components and things that we maintain that

00:28:36.110 --> 00:28:37.420
<v David Flood>we'll use across projects.

00:28:37.930 --> 00:28:41.100
<v David Flood>And we have tried to do the cookie cutter Django project.

00:28:41.640 --> 00:28:47.240
<v David Flood>The truth is, each project is different enough that really we like to evaluate it from first

00:28:47.520 --> 00:28:54.120
<v David Flood>principles as we're evaluating it and thinking, what is the best technology to use?

00:28:55.370 --> 00:28:55.520
<v David Flood>Yeah.

00:28:55.750 --> 00:28:55.920
<v David Flood>Yeah.

00:28:56.020 --> 00:28:59.000
<v David Flood>So yeah, we don't have a cookie cutter.

00:28:59.050 --> 00:29:04.200
<v David Flood>We don't have a kind of a meta framework for bootstrapping them because they're sufficiently

00:29:04.450 --> 00:29:05.780
<v David Flood>different from each other that we...

00:29:05.960 --> 00:29:06.680
<v David Flood>I find that too.

00:29:07.030 --> 00:29:07.640
<v Michael Kennedy>I find that too.

00:29:08.120 --> 00:29:12.520
<v Michael Kennedy>The idea of how we could just grab this cookie cutter or copier.

00:29:12.610 --> 00:29:13.600
<v Michael Kennedy>Are you familiar with copier?

00:29:14.080 --> 00:29:15.480
<v Michael Kennedy>People out there might be familiar with that.

00:29:15.600 --> 00:29:20.740
<v Michael Kennedy>It's a little bit like cookie cutter with the bonus that you can update it later if you

00:29:21.020 --> 00:29:24.499
<v Michael Kennedy>change your mind about something, like actually change this project to use Postgres rather

00:29:24.520 --> 00:29:29.960
<v Michael Kennedy>than SQLite or something, which is pretty cool. But every time that I do, every time I try to work

00:29:30.040 --> 00:29:33.680
<v Michael Kennedy>with one of those projects, even ones that I've created for myself, I'm not, I hate not anyone.

00:29:34.160 --> 00:29:39.400
<v Michael Kennedy>I'm like, oh, it's like 75% awesome and 25%. I just got to take this stuff out. You know,

00:29:39.940 --> 00:29:43.700
<v Michael Kennedy>I'll just, I'll just do it from scratch. It's not, how hard is this? I'll just create a few folders

00:29:43.860 --> 00:29:48.200
<v Michael Kennedy>and put a few things in there and I'll copy the one, like the pyproject.tom or like the one thing

00:29:48.300 --> 00:29:52.260
<v Michael Kennedy>that's like, how do I do this again? I'll just copy that and we're good to go. Yeah. I mean,

00:29:52.540 --> 00:29:53.160
<v David Flood>That's what I find.

00:29:53.460 --> 00:29:53.920
<v David Flood>That's what I find.

00:29:53.930 --> 00:29:56.660
<v David Flood>I find it, it seems like a really brilliant idea,

00:29:56.920 --> 00:30:00.280
<v David Flood>but in practice, it hasn't saved us time yet.

00:30:00.880 --> 00:30:02.320
<v Michael Kennedy>No, I mean, maybe it's a case study.

00:30:02.460 --> 00:30:04.320
<v Michael Kennedy>Like, okay, let's see what they're doing for this one.

00:30:04.320 --> 00:30:05.160
<v Michael Kennedy>Oh, that is interesting

00:30:05.270 --> 00:30:07.460
<v Michael Kennedy>how they're integrating this other thing maybe,

00:30:07.640 --> 00:30:10.820
<v Michael Kennedy>but as a true foundation, I find it in theory awesome.

00:30:11.280 --> 00:30:13.840
<v Michael Kennedy>In practice, I just end up not doing it for various reasons.

00:30:14.200 --> 00:30:14.560
<v Michael Kennedy>Don't know why.

00:30:14.840 --> 00:30:15.840
<v Michael Kennedy>I'm gonna save this for later.

00:30:16.700 --> 00:30:17.880
<v Michael Kennedy>Because the question I'm about to ask you

00:30:17.900 --> 00:30:20.860
<v Michael Kennedy>is gonna send us just down a rat hole.

00:30:21.260 --> 00:30:26.960
<v Michael Kennedy>So instead, before we go down the rat hole, maybe we could, not that one, maybe we could

00:30:27.020 --> 00:30:32.300
<v Michael Kennedy>talk about, I mean, you talked about some, but let's maybe just feature some of the projects

00:30:32.300 --> 00:30:34.580
<v Michael Kennedy>that are maybe more well-known that you guys have done.

00:30:35.059 --> 00:30:35.460
<v David Flood>Sure.

00:30:35.780 --> 00:30:36.120
<v David Flood>Yeah, good.

00:30:36.580 --> 00:30:40.220
<v David Flood>So yeah, one of them is called the Amendments Project.

00:30:40.900 --> 00:30:46.119
<v David Flood>And this is, I didn't know this until I started working on this project, that there are, there

00:30:46.140 --> 00:30:52.740
<v David Flood>There have been thousands of, I think it's 22, at least 22,000 proposed amendments to

00:30:52.740 --> 00:30:55.800
<v David Flood>the United States Constitution that never went anywhere.

00:30:56.230 --> 00:31:01.500
<v David Flood>And so kind of the goal of this project is to show that there have been lots of attempts

00:31:02.140 --> 00:31:06.400
<v David Flood>to amend the Constitution, but actually the Constitution is frozen.

00:31:06.690 --> 00:31:11.480
<v David Flood>I mean, it's not actually amendable anymore, at least not in the politics of any time recently.

00:31:12.440 --> 00:31:13.580
<v David Flood>So this is a database.

00:31:14.180 --> 00:31:19.040
<v Michael Kennedy>I cannot imagine a situation where the U.S. Constitution gets amended.

00:31:19.310 --> 00:31:21.440
<v Michael Kennedy>It has to be unanimous across all the states, right?

00:31:21.820 --> 00:31:22.140
<v Michael Kennedy>Is that right?

00:31:22.250 --> 00:31:22.740
<v Michael Kennedy>I can't remember.

00:31:23.440 --> 00:31:23.840
<v Michael Kennedy>I don't know.

00:31:23.940 --> 00:31:25.720
<v David Flood>I remember off the top of my head if it has to be unanimous,

00:31:25.730 --> 00:31:27.900
<v David Flood>but it certainly has to be across party lines.

00:31:28.520 --> 00:31:30.940
<v Michael Kennedy>Yeah, it's got to be pretty darn close if it's not at all.

00:31:32.020 --> 00:31:36.200
<v Michael Kennedy>It's like time travel or travel to speed of light.

00:31:36.590 --> 00:31:37.640
<v Michael Kennedy>Could be theoretically possible.

00:31:38.180 --> 00:31:38.920
<v Michael Kennedy>Probably not going to happen.

00:31:40.560 --> 00:31:41.340
<v David Flood>No, it's hard to see.

00:31:41.600 --> 00:31:42.140
<v David Flood>It's hard to see.

00:31:42.310 --> 00:31:42.400
<v David Flood>Yeah.

00:31:42.620 --> 00:31:46.160
<v David Flood>So this is from a historian at Harvard.

00:31:46.680 --> 00:31:53.120
<v David Flood>And so it's a database of all and the full text from all of these amendments.

00:31:53.340 --> 00:32:07.600
<v David Flood>And, you know, it's from the public's point of view, it's a Postgres full text vector search interface for finding and filtering through on all of the different amendments that have been proposed.

00:32:08.180 --> 00:32:08.660
<v David Flood>I love it.

00:32:08.980 --> 00:32:10.020
<v Michael Kennedy>Yeah, this is a nice looking site.

00:32:10.520 --> 00:32:11.560
<v David Flood>We work with a designer.

00:32:12.080 --> 00:32:16.920
<v Michael Kennedy>she's very good yeah of course like an agency would right yep yep nice so we'll

00:32:17.020 --> 00:32:22.060
<v Michael Kennedy>get a really pretty rich search interface and then off you go I have no idea even

00:32:22.100 --> 00:32:24.900
<v David Flood>what I would search for but yeah well you can always search for something

00:32:25.180 --> 00:32:28.720
<v David Flood>religious something abortion related there's gonna be lots of things there I

00:32:29.080 --> 00:32:31.920
<v Michael Kennedy>thought all those also like guns but like I don't want to go down I'm not sure I

00:32:32.020 --> 00:32:37.200
<v Michael Kennedy>even want to go down there right awesome though this looks super useful maybe

00:32:37.380 --> 00:32:41.039
<v Michael Kennedy>someday we'll have a functional government again we'll see let's let's

00:32:41.060 --> 00:32:45.820
<v Michael Kennedy>change it or maybe we'll go down and it's folklore like look at you so all right so yeah so another

00:32:45.980 --> 00:32:51.680
<v David Flood>really great uh project at least from a content point of view uh that's interesting um the research

00:32:51.840 --> 00:33:00.680
<v David Flood>that it's doing um is the fin folklore database um which so in in in celtic storytelling you know

00:33:01.820 --> 00:33:07.919
<v David Flood>um moms have been telling and telling stories to daughters and and and and people have been

00:33:08.000 --> 00:33:14.660
<v David Flood>telling stories for a very long time hundreds or a thousand years about um finn mcummel who is a

00:33:14.910 --> 00:33:21.040
<v David Flood>hero a hero from irish mythology some of it some of it based in you know historical events but it

00:33:21.120 --> 00:33:28.940
<v David Flood>goes back it goes back so far um so there are there's many hundreds or thousands of of of these

00:33:29.300 --> 00:33:33.300
<v David Flood>stories that have been spread and versions of these stories that have that have been told and

00:33:33.160 --> 00:33:47.440
<v David Flood>And so some of them are audio recordings where somebody like some researcher has gone out to an island off the coast of Scotland and recorded somebody telling their version of the hero of Finn and his band of heroes.

00:33:47.640 --> 00:33:53.080
<v David Flood>You know, they defend Scotland and Ireland from invaders and attackers.

00:33:53.880 --> 00:33:57.960
<v David Flood>Very exciting stories and stuff and a team of characters.

00:33:59.020 --> 00:34:04.880
<v David Flood>So there's audio recordings and then there's documents, like written documents that contain

00:34:05.120 --> 00:34:05.180
<v David Flood>these.

00:34:05.180 --> 00:34:11.139
<v David Flood>And so this is a database of kind of all of those all in one place with, on the public

00:34:11.340 --> 00:34:17.820
<v David Flood>side, a nice search interface for discovering them, you know, either using the map view or

00:34:18.040 --> 00:34:18.139
<v David Flood>searching.

00:34:18.480 --> 00:34:19.040
<v Michael Kennedy>Yeah, that's cool.

00:34:19.350 --> 00:34:22.159
<v Michael Kennedy>I got my map view for some random thing I searched about here.

00:34:22.600 --> 00:34:22.919
<v Michael Kennedy>Amazing.

00:34:23.260 --> 00:34:26.300
<v Michael Kennedy>But this is pretty interesting, all these different tellings and stuff.

00:34:26.720 --> 00:34:33.200
<v David Flood>Oh, and yeah, one of the big challenges with this project is that it's fully internationalized.

00:34:33.450 --> 00:34:35.120
<v David Flood>So it's available in English.

00:34:35.379 --> 00:34:40.560
<v David Flood>Everything is available in English, Scottish Gaelic, and Irish Gaelic, but that extends

00:34:40.730 --> 00:34:41.419
<v David Flood>into the database.

00:34:41.790 --> 00:34:45.360
<v David Flood>So usually people have multiple names recorded for them.

00:34:45.850 --> 00:34:50.919
<v David Flood>And so, yeah, you may have one person with any number of names in different languages,

00:34:51.040 --> 00:34:53.879
<v David Flood>sometimes more than one Scottish name, that kind of thing.

00:34:54.020 --> 00:34:59.540
<v David Flood>And so the data model on this one is quite messy, but sensible.

00:35:00.300 --> 00:35:03.020
<v David Flood>But yeah, it's quite a lot of different kinds of data to wrangle.

00:35:03.240 --> 00:35:05.180
<v David Flood>And then with all of the translations for each thing.

00:35:05.320 --> 00:35:05.960
<v Michael Kennedy>Yeah, that's wild.

00:35:06.100 --> 00:35:11.380
<v Michael Kennedy>It's not just, we need the user interface of this thing to translate about.

00:35:12.640 --> 00:35:13.420
<v Michael Kennedy>That's way more, right?

00:35:13.720 --> 00:35:14.820
<v David Flood>Yeah, yeah, it is that.

00:35:14.960 --> 00:35:15.540
<v David Flood>It is that.

00:35:15.740 --> 00:35:20.360
<v David Flood>And then it is also, yes, all the items in the database have a translation or can.

00:35:22.420 --> 00:35:25.160
<v Michael Kennedy>This portion of Talk Python To Me is brought to you by us.

00:35:25.500 --> 00:35:30.020
<v Michael Kennedy>I'm thrilled to announce a brand new app built for developers created by yours truly.

00:35:30.480 --> 00:35:31.640
<v Michael Kennedy>It's called Command Book.

00:35:32.440 --> 00:35:33.700
<v Michael Kennedy>You know that thing you do every morning?

00:35:34.200 --> 00:35:38.520
<v Michael Kennedy>Open up six terminal tabs, CD into this directory, activate that virtual environment,

00:35:39.000 --> 00:35:40.460
<v Michael Kennedy>run the server with --reload.

00:35:40.760 --> 00:35:44.720
<v Michael Kennedy>Now, CD somewhere else, start the background worker, another tab for Docker,

00:35:45.080 --> 00:35:46.440
<v Michael Kennedy>another one to tail production logs.

00:35:46.900 --> 00:35:49.620
<v Michael Kennedy>Every tab just says Python, Python, Python, Docker tail.

00:35:50.380 --> 00:35:51.660
<v Michael Kennedy>and you're clicking through them going,

00:35:52.120 --> 00:35:53.320
<v Michael Kennedy>which Python was that again?

00:35:53.840 --> 00:35:54.540
<v Michael Kennedy>Where my app is running?

00:35:55.200 --> 00:35:58.060
<v Michael Kennedy>Then sometime later, your dev server silently dies

00:35:58.260 --> 00:35:59.260
<v Michael Kennedy>because it tried to reload

00:35:59.480 --> 00:36:00.840
<v Michael Kennedy>while you're in the middle of a code edit,

00:36:01.500 --> 00:36:04.080
<v Michael Kennedy>unmatched brace, a half-written import or something.

00:36:04.820 --> 00:36:05.840
<v Michael Kennedy>Now you're hunting through tabs

00:36:05.880 --> 00:36:07.380
<v Michael Kennedy>to figure out which process crashed

00:36:07.440 --> 00:36:08.340
<v Michael Kennedy>and how to restart it.

00:36:08.800 --> 00:36:09.680
<v Michael Kennedy>My app, CommandBook,

00:36:10.000 --> 00:36:13.320
<v Michael Kennedy>gives all of these long-running commands a permanent home.

00:36:13.880 --> 00:36:15.120
<v Michael Kennedy>You save a command once,

00:36:15.500 --> 00:36:16.680
<v Michael Kennedy>the working directory, the environment,

00:36:17.040 --> 00:36:18.140
<v Michael Kennedy>pre-commands like git pull,

00:36:18.480 --> 00:36:20.120
<v Michael Kennedy>and from then on, you just click run.

00:36:20.680 --> 00:36:22.040
<v Michael Kennedy>You can even group commands together

00:36:22.340 --> 00:36:24.140
<v Michael Kennedy>to start and stop everything for a project

00:36:24.460 --> 00:36:25.140
<v Michael Kennedy>with a single click.

00:36:25.560 --> 00:36:27.540
<v Michael Kennedy>It also has what I call honey badger mode,

00:36:27.750 --> 00:36:29.140
<v Michael Kennedy>auto restart on crash.

00:36:29.720 --> 00:36:32.140
<v Michael Kennedy>So when your dev server goes down mid-reload,

00:36:32.700 --> 00:36:34.580
<v Michael Kennedy>command book just brings it right back up

00:36:34.840 --> 00:36:36.900
<v Michael Kennedy>and does so over and over until the code is fixed.

00:36:37.520 --> 00:36:39.320
<v Michael Kennedy>It also detects URLs from your output

00:36:39.470 --> 00:36:41.900
<v Michael Kennedy>so you're never scrolling through thousands of lines of logs

00:36:42.080 --> 00:36:44.040
<v Michael Kennedy>just to figure out how to reopen your web app.

00:36:44.580 --> 00:36:46.280
<v Michael Kennedy>And it shows you uptime, memory usage,

00:36:46.480 --> 00:36:48.460
<v Michael Kennedy>and all sorts of cool things about your process.

00:36:49.160 --> 00:36:51.140
<v Michael Kennedy>The whole thing is a native macOS app.

00:36:51.360 --> 00:36:53.680
<v Michael Kennedy>No Electron, no Chromium, just 21 megs.

00:36:54.160 --> 00:36:55.460
<v Michael Kennedy>And it comes with a full CLI.

00:36:55.700 --> 00:36:57.540
<v Michael Kennedy>So anything you've configured in the UI,

00:36:57.960 --> 00:36:59.220
<v Michael Kennedy>you can fire off from your terminal

00:36:59.380 --> 00:37:00.400
<v Michael Kennedy>with just a single command.

00:37:00.860 --> 00:37:02.700
<v Michael Kennedy>Right now it's macOS only,

00:37:03.220 --> 00:37:04.080
<v Michael Kennedy>but if there's enough interest,

00:37:04.320 --> 00:37:05.380
<v Michael Kennedy>I'll build a Windows version too.

00:37:05.620 --> 00:37:06.340
<v Michael Kennedy>So let me know.

00:37:07.180 --> 00:37:09.300
<v Michael Kennedy>Please check it out at talkpython.fm

00:37:09.560 --> 00:37:11.160
<v Michael Kennedy>slash command book app.

00:37:11.640 --> 00:37:12.400
<v Michael Kennedy>Download it for free,

00:37:12.980 --> 00:37:14.140
<v Michael Kennedy>level up your developer workflow.

00:37:14.600 --> 00:37:16.220
<v Michael Kennedy>The link is in your podcast player show notes.

00:37:16.840 --> 00:37:18.780
<v Michael Kennedy>That's talkpython.fm/command book.

00:37:19.210 --> 00:37:21.000
<v Michael Kennedy>I really hope you enjoy this new app that I built.

00:37:22.640 --> 00:37:26.660
<v Michael Kennedy>You want to work in the native language of the people who did that part of the folklore

00:37:26.930 --> 00:37:27.340
<v Michael Kennedy>or whatever, right?

00:37:27.680 --> 00:37:30.280
<v David Flood>Yeah, well, and people are still speaking those languages.

00:37:30.520 --> 00:37:34.580
<v David Flood>So people who would use this to, you know, like somebody may have heard a story from

00:37:34.630 --> 00:37:37.960
<v David Flood>their mom or dad and are now would like to find other versions of that story.

00:37:38.350 --> 00:37:41.940
<v David Flood>And they live in a part of Scotland where they speak Scottish Gaelic as their first language.

00:37:42.520 --> 00:37:43.680
<v David Flood>They can still access the site.

00:37:43.880 --> 00:37:48.700
<v Michael Kennedy>And then that mapping color history one, that's another one of the public ones that you said is pretty major.

00:37:49.520 --> 00:37:50.080
<v David Flood>Yeah, that's right.

00:37:50.280 --> 00:37:50.420
<v David Flood>Yeah.

00:37:50.720 --> 00:37:53.140
<v David Flood>So, yeah, that's a pigments database.

00:37:53.360 --> 00:38:04.160
<v David Flood>You can search by either English color names like blue and find all of these Asian paintings that have blue or a particular kind of pigment of how they made the blue.

00:38:04.440 --> 00:38:04.960
<v Michael Kennedy>Yeah, nice.

00:38:05.360 --> 00:38:07.700
<v Michael Kennedy>So what's the open source story?

00:38:08.300 --> 00:38:11.240
<v Michael Kennedy>You're creating all these apps, maybe some of these frameworks.

00:38:11.420 --> 00:38:12.280
<v Michael Kennedy>There's got to be some tools.

00:38:12.680 --> 00:38:25.900
<v Michael Kennedy>Is there a big desire or already an effort to have a lot of these things open source or is it too niche or is it just like this is the advantage of Harvard has is other universities don't get this?

00:38:27.300 --> 00:38:29.680
<v David Flood>No, it's something we talk about quite a bit.

00:38:30.820 --> 00:38:35.140
<v David Flood>Usually these things start, usually they start closed source during development.

00:38:35.380 --> 00:38:44.920
<v David Flood>And then we work with the faculty and we talk about how we can take, you know, like the repo for the web app, how we can take that public.

00:38:45.450 --> 00:38:48.100
<v David Flood>And so we've done that for a number of projects.

00:38:48.320 --> 00:38:49.100
<v David Flood>Not all of them are.

00:38:50.000 --> 00:38:55.880
<v David Flood>But the ideal is that they all make their way into the open, and especially when they become archived.

00:38:56.160 --> 00:38:56.440
<v Michael Kennedy>Sure.

00:38:56.770 --> 00:38:58.840
<v Michael Kennedy>Yeah, that's a good way to help them live on.

00:38:58.960 --> 00:39:03.900
<v Michael Kennedy>And they might even go into GitHub's Arctic Vault, which is crazy.

00:39:03.980 --> 00:39:14.620
<v Michael Kennedy>I don't know if people know about that out there, but GitHub has, quite a while ago, started taking copies of all of the repos and backing them up and storing them in the Arctic vault.

00:39:14.810 --> 00:39:15.240
<v Michael Kennedy>It's kind of cool.

00:39:15.810 --> 00:39:18.660
<v Michael Kennedy>I really, really, really hope we never need that, but it's kind of neat.

00:39:18.820 --> 00:39:19.340
<v David Flood>Yeah, me too.

00:39:20.319 --> 00:39:30.300
<v David Flood>Usually universities have their own archival system, so any important research data is usually part of that system as well.

00:39:30.560 --> 00:39:30.840
<v Michael Kennedy>I see.

00:39:30.990 --> 00:39:31.080
<v Michael Kennedy>Okay.

00:39:31.440 --> 00:39:31.520
<v Michael Kennedy>Yeah.

00:39:32.160 --> 00:39:32.780
<v Michael Kennedy>Obviously, right?

00:39:32.840 --> 00:39:34.940
<v Michael Kennedy>Like I'm just, I can't remember where it was.

00:39:34.940 --> 00:39:39.760
<v Michael Kennedy>It was somewhere, I think it was South Korea or Taiwan where like seven years of government

00:39:40.060 --> 00:39:41.820
<v Michael Kennedy>data got lost or something like that.

00:39:41.820 --> 00:39:43.200
<v Michael Kennedy>It was really, really bad recently.

00:39:43.500 --> 00:39:46.980
<v Michael Kennedy>There was a fire and I think they had backups, but maybe just into the building, you know,

00:39:47.040 --> 00:39:48.000
<v Michael Kennedy>like we'll put that out.

00:39:48.340 --> 00:39:49.800
<v Michael Kennedy>We'll back it up to the hard drive over here.

00:39:50.180 --> 00:39:50.480
<v Michael Kennedy>Not good.

00:39:51.000 --> 00:39:52.920
<v Michael Kennedy>No, not good.

00:39:52.920 --> 00:39:54.340
<v Michael Kennedy>You definitely want this stuff to survive.

00:39:54.360 --> 00:40:00.080
<v Michael Kennedy>I mean, academia has this history of like tomes that have survived the past and really,

00:40:00.260 --> 00:40:02.340
<v Michael Kennedy>really long lived information.

00:40:02.640 --> 00:40:02.700
<v Michael Kennedy>Right.

00:40:02.760 --> 00:40:05.400
<v Michael Kennedy>besides the Library of Alexandria or something like that, maybe.

00:40:05.680 --> 00:40:06.380
<v David Flood>That's what we want.

00:40:06.620 --> 00:40:07.080
<v David Flood>That's what we want.

00:40:07.080 --> 00:40:08.860
<v Michael Kennedy>We want it to, yeah, we want it to last.

00:40:09.560 --> 00:40:09.860
<v Michael Kennedy>Absolutely.

00:40:10.180 --> 00:40:14.540
<v Michael Kennedy>So maybe that's a good time to sort of talk about the trailing end.

00:40:14.540 --> 00:40:17.020
<v Michael Kennedy>I think there's a lot of interesting things going on here.

00:40:18.360 --> 00:40:22.740
<v Michael Kennedy>Just like you've run out of money, not because you actually run out of money.

00:40:23.260 --> 00:40:26.520
<v Michael Kennedy>The grant is done and you've either spent or given back or whatever

00:40:26.820 --> 00:40:28.460
<v Michael Kennedy>with the remaining little bits of money.

00:40:28.780 --> 00:40:30.100
<v Michael Kennedy>It's always a weird balance with research.

00:40:30.600 --> 00:40:33.740
<v Michael Kennedy>It's like, oh, we got $3,000 left on this research grant.

00:40:33.790 --> 00:40:34.700
<v Michael Kennedy>What are we going to do with it?

00:40:34.780 --> 00:40:35.820
<v Michael Kennedy>It's not like, oh, we're going to give it back.

00:40:35.930 --> 00:40:36.580
<v Michael Kennedy>We just didn't need it.

00:40:36.940 --> 00:40:41.300
<v Michael Kennedy>It's like, we're going to find a way to like fund a student to do a little more work or

00:40:41.460 --> 00:40:41.520
<v Michael Kennedy>whatever.

00:40:41.720 --> 00:40:43.400
<v Michael Kennedy>But eventually the grant is over.

00:40:43.940 --> 00:40:44.240
<v Michael Kennedy>That's right.

00:40:44.660 --> 00:40:48.560
<v Michael Kennedy>You've got some expensive app access to a big database because it needs a big search or

00:40:49.260 --> 00:40:50.200
<v Michael Kennedy>a lot of compute or something.

00:40:50.780 --> 00:40:51.040
<v David Flood>That's right.

00:40:52.130 --> 00:40:56.000
<v David Flood>Everything during, like, I mean, anything, anything that's a, that's a Django app.

00:40:56.640 --> 00:41:04.820
<v David Flood>We deploy to AWS using containers, which isn't the cheapest way to host anything.

00:41:05.760 --> 00:41:09.100
<v David Flood>But that's for the most part the Harvard way.

00:41:10.200 --> 00:41:12.240
<v David Flood>And it is robust and is reliable.

00:41:12.800 --> 00:41:22.320
<v David Flood>And we don't have a DevOps person on call on the weekend to rescue one of these apps.

00:41:22.480 --> 00:41:24.980
<v David Flood>So having them reliable is good.

00:41:25.400 --> 00:41:29.940
<v David Flood>Okay, so it's on AWS and paying for the containers,

00:41:30.180 --> 00:41:32.500
<v David Flood>paying for that Elasticsearch cluster,

00:41:33.180 --> 00:41:36.280
<v David Flood>the RDS Postgres database.

00:41:36.890 --> 00:41:38.940
<v David Flood>Okay, well, even if somebody wants to start paying

00:41:38.940 --> 00:41:39.860
<v David Flood>for that out-of-pocket,

00:41:40.100 --> 00:41:40.980
<v David Flood>all of those little services,

00:41:41.090 --> 00:41:44.080
<v David Flood>they add up to enough that we need to do something

00:41:44.530 --> 00:41:46.340
<v David Flood>when the project hits end of life.

00:41:46.650 --> 00:41:50.359
<v David Flood>And so our gold standard that we've developed so far

00:41:50.380 --> 00:41:55.340
<v David Flood>is asking, can this become a static website?

00:41:55.860 --> 00:41:58.600
<v David Flood>Can we bake this out into all HTML files

00:41:59.200 --> 00:42:01.800
<v David Flood>and acknowledge that there will be some trade-offs?

00:42:01.960 --> 00:42:04.440
<v David Flood>We will trade off some searching.

00:42:04.880 --> 00:42:06.540
<v David Flood>You know, it's not gonna have Elasticsearch.

00:42:06.840 --> 00:42:08.320
<v David Flood>Doesn't mean that it won't have any search though.

00:42:08.620 --> 00:42:10.160
<v David Flood>So we'll trade out Elasticsearch

00:42:10.560 --> 00:42:12.840
<v David Flood>and it'll be very difficult to add new data,

00:42:13.340 --> 00:42:15.020
<v David Flood>but that's okay because it's being archived.

00:42:15.340 --> 00:42:16.940
<v David Flood>So can we get it into a static site?

00:42:18.040 --> 00:42:20.940
<v David Flood>And that's challenging depending on how you've set it up.

00:42:20.980 --> 00:42:26.460
<v David Flood>So we now have projects where we set them up from the beginning to be archivable like this.

00:42:26.460 --> 00:42:28.960
<v David Flood>And one of them is called Water Stories.

00:42:29.520 --> 00:42:35.960
<v David Flood>And it was a companion to an art installation at the Radcliffe Institute on the Harvard campus.

00:42:36.700 --> 00:42:45.380
<v David Flood>And so this was this live site during the duration of the art installation where people could come in and add stories that they had about water onto an iPad.

00:42:46.380 --> 00:42:47.660
<v David Flood>And then those went up to our database.

00:42:49.000 --> 00:42:54.340
<v David Flood>we built that with something called Django bakery which if you opt in and you use all of their

00:42:54.440 --> 00:43:00.480
<v David Flood>class-based views the way that they're meant to be used then you can bake this out into static files

00:43:00.480 --> 00:43:05.660
<v David Flood>when you're done very low effort that was perfect that is such a cool idea and mad props to them for

00:43:05.760 --> 00:43:11.440
<v Michael Kennedy>ASCII art logos come on now I feel like that should be in the view source if it's not but

00:43:11.800 --> 00:43:17.260
<v Michael Kennedy>this is such a cool idea because you can you can just take a working site you guys are a Django

00:43:17.260 --> 00:43:22.360
<v Michael Kennedy>shop. So you have a lot of your sites are written in Django and you just go make it static, right?

00:43:22.860 --> 00:43:27.220
<v David Flood>Essentially. Yes. And, and what's, what's, what's really great about it is if they wanted to make

00:43:27.240 --> 00:43:31.400
<v David Flood>a change and they have, they have asked since we, since we made it static, they've asked for a

00:43:31.480 --> 00:43:37.020
<v David Flood>couple of changes. So locally, I just Docker compose up this whole application, make the change

00:43:37.120 --> 00:43:42.420
<v David Flood>in the Django admin and rebake the site. And so it's, it can still be updated. Something,

00:43:42.600 --> 00:43:46.800
<v Michael Kennedy>if you've never tried this, like something like, Hey, can we just add one more menu item?

00:43:47.140 --> 00:43:50.100
<v Michael Kennedy>And you're like, no, no, no, we're not adding the menu item because you want that.

00:43:50.140 --> 00:43:55.980
<v Michael Kennedy>That means we're changing 7,300 pages because they all bake in the whole HTML.

00:43:56.400 --> 00:43:56.460
<v Michael Kennedy>Right?

00:43:56.700 --> 00:43:57.020
<v David Flood>Exactly.

00:43:57.560 --> 00:43:57.940
<v David Flood>Yeah, exactly.

00:43:58.190 --> 00:44:02.740
<v David Flood>But if that's in my, in my Django database and my SQLite file, then no problem at

00:44:02.840 --> 00:44:04.340
<v David Flood>all because then I just rebake it.

00:44:04.620 --> 00:44:05.400
<v Michael Kennedy>Yeah, yeah, exactly.

00:44:05.600 --> 00:44:06.100
<v Michael Kennedy>Absolutely.

00:44:06.859 --> 00:44:09.480
<v Michael Kennedy>So I think this is super neat.

00:44:09.560 --> 00:44:12.920
<v Michael Kennedy>There's also frozen, frozen flask.

00:44:13.520 --> 00:44:16.740
<v Michael Kennedy>If I could get rid of all the ads, I do not need a Yeti thing, whatever that is.

00:44:17.200 --> 00:44:24.900
<v Michael Kennedy>the glass, not the mythical thing, but frozen flask,  which does a similar thing for flask

00:44:25.300 --> 00:44:30.180
<v Michael Kennedy>apps. If you're a flask person probably would work with court. Don't know for sure, but probably.

00:44:30.520 --> 00:44:36.680
<v Michael Kennedy>So that's a pretty interesting idea as well. throw that in there. but also what else?

00:44:37.460 --> 00:44:45.380
<v Michael Kennedy>Also you talked about search, right? That can be, can be such a problem. And I'm a huge fan of your

00:44:45.320 --> 00:44:50.960
<v Michael Kennedy>recommendation here with a page find. Tell us about page find. So this has been, I think it's been a

00:44:50.960 --> 00:44:56.920
<v David Flood>bit of a game changer in how functional one of these archived sites can remain. So we're actually

00:44:56.920 --> 00:45:03.360
<v David Flood>in the process of that amendments website that searches across 22,000 full texts of amendments.

00:45:04.080 --> 00:45:09.580
<v David Flood>We are in the process of sunsetting that, and that will become a static site. And for that search,

00:45:09.740 --> 00:45:16.020
<v David Flood>we already have an internal demo that proves that we can replace that Postgres full search

00:45:16.760 --> 00:45:22.280
<v Michael Kennedy>with PageFind. You lose vector search. Yeah. You've kind of got to get really

00:45:22.960 --> 00:45:27.340
<v Michael Kennedy>true keyword matching. Yeah. Yeah, that's right. But you still get filtering. I mean,

00:45:27.360 --> 00:45:34.019
<v David Flood>and really faceting and filtering is when it comes to discovery of things, I mean, I find

00:45:34.040 --> 00:45:40.360
<v David Flood>that's really what's useful. So filtering these amendments by state or by the Congress that was

00:45:40.500 --> 00:45:50.160
<v David Flood>active at the time or by the person who co-wrote it. All of those are totally great in PageFind.

00:45:50.380 --> 00:45:55.240
<v David Flood>And the keyword search is just fine in PageFind. One of the things I really like about it is that

00:45:55.540 --> 00:46:00.939
<v David Flood>it takes your index and it chops it up into lots of little files that can just fly across the

00:46:00.960 --> 00:46:06.640
<v David Flood>network. So it's a very fast search. It's not a huge network load, even if your index is

00:46:07.260 --> 00:46:13.360
<v David Flood>initially very large. And it essentially cuts it up somewhat alphabetically. So if your search

00:46:14.070 --> 00:46:20.800
<v David Flood>starts with T, or I should say a better word for audio, if it starts with W, then it will load up

00:46:20.810 --> 00:46:26.000
<v David Flood>the index for words that start with W and fly that over the network instead of the whole thing.

00:46:26.120 --> 00:46:29.220
<v David Flood>So it's pretty slick and it has a great Python API.

00:46:29.760 --> 00:46:33.320
<v David Flood>So to do the proof of concept for the amendments search,

00:46:33.950 --> 00:46:40.000
<v David Flood>I just took a database dump and then manually indexed with a Python script into PageFind.

00:46:40.180 --> 00:46:42.980
<v Michael Kennedy>Wait, there's a Python API for PageFind?

00:46:43.360 --> 00:46:47.800
<v David Flood>Yeah. So the way PageFind works, I should have said that, is the way most people will use it

00:46:48.140 --> 00:46:55.160
<v David Flood>is by normally PageFind consumes HTML. So you give it access to your dist folder.

00:46:56.040 --> 00:46:56.660
<v Michael Kennedy>Oh, okay.

00:46:57.700 --> 00:47:00.360
<v David Flood>And then it crawls through all of your HTML files.

00:47:00.580 --> 00:47:05.840
<v David Flood>And you can do great things like adding little HTML tags that are just for PageFind,

00:47:05.940 --> 00:47:09.340
<v David Flood>that give it the filtering ability, or that you want to sort by something.

00:47:09.620 --> 00:47:10.780
<v David Flood>And so that's great.

00:47:11.380 --> 00:47:18.280
<v David Flood>Or you can just call PageFind from Python or from TypeScript and just build that index manually.

00:47:18.660 --> 00:47:19.660
<v David Flood>Well, thanks a lot, David.

00:47:19.800 --> 00:47:21.160
<v Michael Kennedy>I have another thing I've got to go research.

00:47:21.360 --> 00:47:21.840
<v Michael Kennedy>This is awesome.

00:47:22.560 --> 00:47:24.520
<v Michael Kennedy>I'm a huge fan of PageFind, as I said.

00:47:24.540 --> 00:47:27.040
<v Michael Kennedy>on my personal website, mkennedy.codes,

00:47:27.440 --> 00:47:29.040
<v Michael Kennedy>is just a pure stat.

00:47:29.090 --> 00:47:31.520
<v Michael Kennedy>It starts in Markdown and ends up in HTML.

00:47:31.940 --> 00:47:34.600
<v Michael Kennedy>But if you add page find in, you get a super rich,

00:47:34.780 --> 00:47:36.240
<v Michael Kennedy>if you want to just know, you want to talk about,

00:47:36.360 --> 00:47:37.120
<v Michael Kennedy>like what was about Docker,

00:47:37.590 --> 00:47:39.960
<v Michael Kennedy>it shows you really nice results,

00:47:40.500 --> 00:47:42.060
<v Michael Kennedy>pulling out the different parts of the page

00:47:42.300 --> 00:47:43.480
<v Michael Kennedy>and sections that talk about it,

00:47:43.560 --> 00:47:45.540
<v Michael Kennedy>like the headers and then what is said.

00:47:45.670 --> 00:47:48.520
<v Michael Kennedy>And it even does like sub, sub word,

00:47:48.920 --> 00:47:50.280
<v Michael Kennedy>you know, like you just type doc,

00:47:50.620 --> 00:47:51.980
<v Michael Kennedy>it finds all the words that match that.

00:47:52.160 --> 00:47:54.500
<v Michael Kennedy>And what I really like about it is a couple of things

00:47:54.520 --> 00:47:59.660
<v Michael Kennedy>it's instant. It basically is like nearly instant. If you type a few things, it gets way faster

00:47:59.720 --> 00:48:04.600
<v Michael Kennedy>because it's pulling down. And if you go and look in the network console here and you type

00:48:05.220 --> 00:48:10.540
<v Michael Kennedy>something, you can see that it's actually pulling in these little tiny fragments, which this one's

00:48:10.640 --> 00:48:16.480
<v Michael Kennedy>coming off disk cache in three milliseconds, right? But it breaks your index into a bunch of very small

00:48:16.980 --> 00:48:22.000
<v Michael Kennedy>page find fragments that I think it's like, it starts with anything that starts with the word

00:48:21.980 --> 00:48:24.860
<v Michael Kennedy>DO. These are all the prebuilt results and stuff like that. Right.

00:48:25.080 --> 00:48:26.220
<v Michael Kennedy>That's right. That's right.

00:48:26.440 --> 00:48:27.440
<v Michael Kennedy>Yeah. That's super cool.

00:48:27.940 --> 00:48:34.440
<v David Flood>Yeah. One of our open source projects that, that we maintain is a view of a

00:48:34.440 --> 00:48:39.780
<v David Flood>view JS component library for page find so that we can style it and reuse it

00:48:39.810 --> 00:48:40.680
<v David Flood>across different projects.

00:48:41.040 --> 00:48:42.460
<v David Flood>Oh, that's awesome. I love it.

00:48:42.780 --> 00:48:44.180
<v Michael Kennedy>Yeah. I think this really unlocks it.

00:48:44.180 --> 00:48:48.800
<v Michael Kennedy>And I mean, you go to so many, so many sites, like their documentation or just

00:48:48.850 --> 00:48:51.440
<v Michael Kennedy>their web app in the search is so bad.

00:48:51.640 --> 00:48:56.680
<v Michael Kennedy>You type something and it's like thinking, spinning, spinning, spinning, spinning.

00:48:57.040 --> 00:49:00.280
<v Michael Kennedy>And then like five seconds later, it gives you kind of janky results.

00:49:00.700 --> 00:49:04.680
<v Michael Kennedy>And if you just like throw a page find in there, it's, you can't type fast enough to

00:49:05.100 --> 00:49:05.760
<v Michael Kennedy>outrun the results.

00:49:05.820 --> 00:49:06.220
<v Michael Kennedy>You know what I mean?

00:49:06.520 --> 00:49:07.100
<v David Flood>No, that's right.

00:49:07.180 --> 00:49:07.260
<v David Flood>Yeah.

00:49:07.880 --> 00:49:12.980
<v David Flood>Too many static site search solutions, they use like a, like a JSON blob that you, that

00:49:12.980 --> 00:49:15.280
<v David Flood>you have to pull down and, and then iterate through.

00:49:15.940 --> 00:49:16.580
<v Michael Kennedy>You know, what's worse.

00:49:16.680 --> 00:49:21.240
<v Michael Kennedy>and I see this a lot, would be if you go to google.com

00:49:21.820 --> 00:49:24.540
<v Michael Kennedy>and then you would say effectively site colon whatever

00:49:24.790 --> 00:49:26.260
<v Michael Kennedy>and then you search Docker, right?

00:49:26.350 --> 00:49:28.120
<v Michael Kennedy>They basically pull that.

00:49:29.000 --> 00:49:30.520
<v Michael Kennedy>You know, they just say search this

00:49:30.760 --> 00:49:33.560
<v Michael Kennedy>and you just get Google results for your site.

00:49:33.650 --> 00:49:36.460
<v Michael Kennedy>And obviously it's, I mean, Google's fine, but it's just.

00:49:36.600 --> 00:49:38.360
<v David Flood>No, I find that unusable, really.

00:49:38.460 --> 00:49:38.820
<v Michael Kennedy>I do too.

00:49:38.950 --> 00:49:40.260
<v Michael Kennedy>It really, you're like, ah, geez.

00:49:41.140 --> 00:49:43.220
<v Michael Kennedy>But now I'm super excited to realize

00:49:43.370 --> 00:49:46.220
<v Michael Kennedy>I can do that from my dynamic content as well.

00:49:46.640 --> 00:49:48.460
<v Michael Kennedy>So with the Python integration.

00:49:48.880 --> 00:49:49.760
<v Michael Kennedy>OK, nice.

00:49:51.360 --> 00:49:53.480
<v Michael Kennedy>What about something truly static?

00:49:53.600 --> 00:49:56.440
<v Michael Kennedy>Have you looked at Hugo and some of the other type of things?

00:49:56.880 --> 00:49:57.160
<v David Flood>Sure.

00:49:57.390 --> 00:50:01.960
<v David Flood>So when I see you've even got the tab up for the SUMEB project,

00:50:02.300 --> 00:50:08.680
<v David Flood>which is-- that's essentially a database of many, many specimens

00:50:09.200 --> 00:50:10.440
<v David Flood>taken from the SUMEB mine.

00:50:11.460 --> 00:50:12.320
<v David Flood>So in the--

00:50:12.530 --> 00:50:13.040
<v David Flood>Oh, it is.

00:50:13.040 --> 00:50:13.740
<v David Flood>Yeah, yeah, it is.

00:50:13.900 --> 00:50:15.640
<v David Flood>So if you click on Minerals database,

00:50:16.180 --> 00:50:19.620
<v David Flood>you open up that search interface and that's powered by PageFind.

00:50:19.760 --> 00:50:20.660
<v David Flood>Oh, this is?

00:50:21.200 --> 00:50:21.300
<v David Flood>Yes.

00:50:22.520 --> 00:50:23.740
<v David Flood>I forget what I was...

00:50:23.930 --> 00:50:24.260
<v David Flood>I see.

00:50:24.530 --> 00:50:26.640
<v David Flood>You guys even hooked into...

00:50:26.700 --> 00:50:29.880
<v Michael Kennedy>I was thinking just like pure static, like Hugo, like...

00:50:30.080 --> 00:50:31.540
<v David Flood>Oh, yes. Yes. Yes.

00:50:31.840 --> 00:50:32.980
<v David Flood>So this is an Astro site.

00:50:33.360 --> 00:50:37.540
<v David Flood>So for this website, we have this as an Astro site so that we have a little...

00:50:37.600 --> 00:50:41.520
<v David Flood>Because with Astro, they make it so easy to pull in like view components.

00:50:42.100 --> 00:50:47.720
<v David Flood>So like our page find is a custom view JS component library with Astro.

00:50:47.730 --> 00:50:52.620
<v David Flood>You can use React components, you can use the view components, but what it does is it's just

00:50:52.620 --> 00:50:56.980
<v Michael Kennedy>a static site generator. Fantastic. So a little bit more designable

00:50:57.460 --> 00:51:00.120
<v Michael Kennedy>than like Hugo or something. Here's your markdown file. Good luck with that.

00:51:00.220 --> 00:51:05.020
<v David Flood>Yeah. I love Hugo though. Yeah. I use Hugo for different personal sites here and there,

00:51:05.070 --> 00:51:08.420
<v David Flood>and it's just so fast and easy to get up and running. But yeah, it's great.

00:51:08.440 --> 00:51:09.400
<v Michael Kennedy>- Great, great when it's a good friend.

00:51:09.400 --> 00:51:10.740
<v Michael Kennedy>- That's what my website's written in, it's in Hugo.

00:51:12.239 --> 00:51:14.280
<v Michael Kennedy>But if I'm integrating with anything else,

00:51:14.400 --> 00:51:15.740
<v Michael Kennedy>I used to kind of like split it up,

00:51:15.790 --> 00:51:17.920
<v Michael Kennedy>like this part's Hugo and this part's like a Python app.

00:51:17.920 --> 00:51:20.000
<v Michael Kennedy>And it's pretty easy to get something

00:51:20.140 --> 00:51:21.620
<v Michael Kennedy>that'll take a bunch of markdown files

00:51:21.820 --> 00:51:23.200
<v Michael Kennedy>and just turn them into HTML

00:51:23.700 --> 00:51:25.400
<v Michael Kennedy>and just put a page template around that.

00:51:25.580 --> 00:51:29.000
<v Michael Kennedy>So I've kind of stepped away from mixing and matching that

00:51:29.140 --> 00:51:29.960
<v Michael Kennedy>as much as I used to.

00:51:30.230 --> 00:51:32.940
<v Michael Kennedy>So now if I got a static section of a dynamic site,

00:51:33.400 --> 00:51:34.000
<v Michael Kennedy>but that doesn't address,

00:51:34.140 --> 00:51:37.780
<v Michael Kennedy>has nothing to do with the archival side of things, right?

00:51:38.440 --> 00:51:41.840
<v Michael Kennedy>Because the idea is that the thing that I'm describing is gone on purpose.

00:51:42.180 --> 00:51:42.600
<v David Flood>That's right.

00:51:42.840 --> 00:51:45.980
<v Michael Kennedy>So you've got some, we've got Django Bakery.

00:51:46.440 --> 00:51:52.580
<v Michael Kennedy>I threw out Frozen Flask, and I'm sure there's a ton more that neither of us are aware of at the moment.

00:51:52.800 --> 00:51:56.380
<v David Flood>So Django Bakery was really good for that purpose.

00:51:56.640 --> 00:52:00.600
<v David Flood>And we're keeping our eyes open for projects that it's a good fit for.

00:52:01.560 --> 00:52:03.420
<v David Flood>But that was a pretty simple website.

00:52:03.620 --> 00:52:06.260
<v David Flood>It needed a dynamic backend, but it was quite straightforward.

00:52:06.960 --> 00:52:09.860
<v David Flood>And for Django Bakery, you have to opt into inheriting

00:52:10.080 --> 00:52:11.520
<v David Flood>from their class-based views.

00:52:11.580 --> 00:52:11.840
<v David Flood>I see.

00:52:12.700 --> 00:52:13.800
<v David Flood>So if you're doing, for example--

00:52:13.800 --> 00:52:14.880
<v Michael Kennedy>You've got to dig ahead of it, yeah.

00:52:15.260 --> 00:52:16.680
<v David Flood>Yeah, yeah, yeah, absolutely.

00:52:17.000 --> 00:52:18.640
<v David Flood>Yeah, hard to add retroactively.

00:52:18.780 --> 00:52:19.380
<v David Flood>Probably impossible.

00:52:20.340 --> 00:52:23.120
<v David Flood>Now, our other websites, like the fin example

00:52:23.380 --> 00:52:27.060
<v David Flood>and the mapping color example, those are APIs.

00:52:27.500 --> 00:52:29.800
<v David Flood>That's a Django API, Django REST framework for one,

00:52:30.700 --> 00:52:31.920
<v David Flood>GraphQL for the other.

00:52:32.540 --> 00:52:34.560
<v David Flood>One has a view front end, one has a React front end.

00:52:34.900 --> 00:52:36.920
<v David Flood>OK, well, Django Bakery just isn't

00:52:36.940 --> 00:52:39.580
<v David Flood>isn't going to work very well for like serializing JSON.

00:52:39.760 --> 00:52:40.680
<v Michael Kennedy>Yeah, it's like awesome.

00:52:40.940 --> 00:52:44.080
<v Michael Kennedy>Here's your unrendered JavaScript front end code

00:52:44.180 --> 00:52:45.560
<v Michael Kennedy>and it's just going to look empty or something.

00:52:45.980 --> 00:52:46.060
<v David Flood>Yeah.

00:52:46.400 --> 00:52:48.800
<v David Flood>So it is a good reason to consider using

00:52:49.680 --> 00:52:51.460
<v David Flood>like vanilla Django templates when possible,

00:52:52.440 --> 00:52:53.220
<v David Flood>like for that reason.

00:52:53.440 --> 00:52:57.880
<v David Flood>But those were, those were inherited from the vendors,

00:52:58.880 --> 00:52:59.420
<v David Flood>those two sites.

00:52:59.440 --> 00:53:00.960
<v David Flood>And we've made a lot of progress on those.

00:53:01.520 --> 00:53:04.740
<v David Flood>So, you know, what, what to do in that,

00:53:05.000 --> 00:53:10.360
<v David Flood>like in that situation, Django Bakery isn't an option. And those projects are not end of life

00:53:10.600 --> 00:53:14.960
<v David Flood>yet. So we have some time, but we're, we're, we're, so what we're doing is strategizing, okay,

00:53:15.280 --> 00:53:20.720
<v David Flood>how will we rescue them? How will we keep them alive once, once somebody needs to stop paying

00:53:20.880 --> 00:53:25.620
<v David Flood>for hosting? And we have, we have ideas. We have, I think there's, there's clever, interesting

00:53:26.060 --> 00:53:34.900
<v Michael Kennedy>things out there. We'll have to keep looking into it. There are some pretty interesting ideas. And

00:53:34.920 --> 00:53:41.020
<v Michael Kennedy>that ran in a container, you could just have WebAssembly, but still have it go, right?

00:53:41.140 --> 00:53:42.780
<v Michael Kennedy>Sort of a local loopback type of thing.

00:53:43.000 --> 00:53:50.640
<v David Flood>Yeah, I'm really interested in this one because it enables essentially the full functionality

00:53:51.140 --> 00:53:54.960
<v David Flood>of the live site to exist as what is just a static site.

00:53:55.640 --> 00:54:03.160
<v David Flood>So because of Pyodide and projects like PyScript, we can run Python in the browser and we can

00:54:03.120 --> 00:54:09.220
<v David Flood>run SQLite in the browser. And now we can even run Postgres in the browser with PG Lite. So if

00:54:09.300 --> 00:54:15.320
<v David Flood>we can run all those things in the browser, then couldn't we have Django hosted right in the browser?

00:54:15.880 --> 00:54:22.320
<v David Flood>And you can. So there's a proof of concept that proves it's possible called Django WebAssembly.

00:54:23.360 --> 00:54:29.940
<v David Flood>And if you load this up, it'll let you log in to the Django admin. And you're not logging into

00:54:29.960 --> 00:54:36.380
<v David Flood>anybody's backend, you're logging into your own browser where this is running in a service worker.

00:54:36.680 --> 00:54:40.280
<v Michael Kennedy>Awesome. Look at that. Oh, hold on. I told me what the password was. Very secure.

00:54:40.860 --> 00:54:41.940
<v Michael Kennedy>Matt, password.

00:54:42.220 --> 00:54:47.000
<v David Flood>Well, it can be entirely insecure because, yeah, you're just, it's running right in your own browser.

00:54:47.300 --> 00:54:50.080
<v Michael Kennedy>Yeah, that's awesome. And here we are, Django admin. Incredible.

00:54:50.480 --> 00:54:55.020
<v David Flood>Yeah, so I'm pretty interested in this. You've got to convert an RDS Postgres database

00:54:55.640 --> 00:54:59.640
<v David Flood>into either SQLite or something like PGLite, but I think that's all doable.

00:54:59.980 --> 00:55:01.920
<v David Flood>So I think it's an exciting possibility.

00:55:02.340 --> 00:55:02.940
<v Michael Kennedy>Yeah, for sure.

00:55:03.010 --> 00:55:06.860
<v Michael Kennedy>I do think, so maybe you have a rich query system

00:55:07.030 --> 00:55:08.140
<v Michael Kennedy>that you're powering by your database

00:55:08.480 --> 00:55:09.040
<v Michael Kennedy>that's really heavy.

00:55:09.480 --> 00:55:09.840
<v David Flood>Exactly.

00:55:10.120 --> 00:55:11.680
<v Michael Kennedy>And it's got a bunch of data that's like,

00:55:11.720 --> 00:55:13.500
<v Michael Kennedy>here's all of our working data

00:55:13.620 --> 00:55:14.740
<v Michael Kennedy>that you might ask questions about.

00:55:15.060 --> 00:55:16.920
<v Michael Kennedy>Maybe you just convert that to page find

00:55:17.580 --> 00:55:18.540
<v Michael Kennedy>to help you find the pieces

00:55:18.960 --> 00:55:20.500
<v Michael Kennedy>and then just keep the operational data

00:55:20.720 --> 00:55:23.300
<v Michael Kennedy>and maybe like even a SQLite with like the Django RRM,

00:55:23.300 --> 00:55:25.600
<v Michael Kennedy>you can just switch the connection, keep talking to it.

00:55:25.750 --> 00:55:26.900
<v Michael Kennedy>I mean, there's possibilities

00:55:27.050 --> 00:55:28.900
<v Michael Kennedy>to just get something not too terrible

00:55:28.920 --> 00:55:30.740
<v Michael Kennedy>Well, it's not the same, but not that far off.

00:55:31.080 --> 00:55:31.680
<v David Flood>Yeah, exactly.

00:55:32.190 --> 00:55:35.420
<v David Flood>And then it goes on GitHub pages and it can live hopefully forever.

00:55:35.700 --> 00:55:40.300
<v David Flood>I mean, it feels like GitHub will last forever, but it'll last longer than funding will anyways.

00:55:41.120 --> 00:55:48.380
<v Michael Kennedy>It's definitely going to last longer than just something that we can't pay for anymore, right?

00:55:48.520 --> 00:55:53.900
<v Michael Kennedy>I don't know how long GitHub's going to be around for, I think a while, but you never know, right?

00:55:53.960 --> 00:55:57.400
<v Michael Kennedy>It seems like stuff's going to last forever, then it gets changed.

00:55:57.520 --> 00:55:58.180
<v Michael Kennedy>We had subversion.

00:55:59.000 --> 00:56:00.480
<v Michael Kennedy>Now it's completely gone, right?

00:56:00.800 --> 00:56:04.780
<v Michael Kennedy>Just 20 years, 15 years later, but still, I think 100% there.

00:56:05.020 --> 00:56:05.260
<v David Flood>Yeah.

00:56:05.580 --> 00:56:09.520
<v David Flood>But if somebody can, if something ever happened, somebody just needs to copy that,

00:56:09.750 --> 00:56:15.800
<v David Flood>that folder of HTML, CSS and JavaScript files and dump it into an S3 bucket or somewhere else.

00:56:15.950 --> 00:56:17.360
<v David Flood>And then it can continue living there.

00:56:17.860 --> 00:56:18.800
<v David Flood>So it's a good option.

00:56:19.440 --> 00:56:20.020
<v Michael Kennedy>It's a great option.

00:56:20.320 --> 00:56:21.400
<v Michael Kennedy>It's a really, really good option.

00:56:21.660 --> 00:56:30.940
<v Michael Kennedy>I mean, I guess one of the long-term concerns might be what if the WebAssembly standard changes so much that it's not supported anymore?

00:56:31.520 --> 00:56:36.860
<v Michael Kennedy>But you could probably bite-wise convert it if you had to, you know, like somebody would probably be able to create one.

00:56:37.240 --> 00:56:38.560
<v David Flood>Yeah, that would be unfortunate.

00:56:39.060 --> 00:56:48.860
<v David Flood>So I suppose if that happens, I mean, if that happens, yeah, we're booting up one of these projects is like booting up an emulator for some old DOS game.

00:56:49.060 --> 00:56:49.540
<v Michael Kennedy>Right, right.

00:56:49.720 --> 00:56:52.320
<v Michael Kennedy>Well, I mean, I guess let's think about this for a second.

00:56:52.840 --> 00:56:55.460
<v Michael Kennedy>Somebody got, oh gosh, what was the chain?

00:56:55.510 --> 00:57:03.180
<v Michael Kennedy>This is the whole,  JavaScript, the PyCon talk where got like Firefox

00:57:04.280 --> 00:57:10.080
<v Michael Kennedy>compiled into, not WASM, into,  ASM JS or something like that.

00:57:10.250 --> 00:57:14.300
<v Michael Kennedy>So it was run like Chrome was running Firefox, which was running, I think

00:57:14.620 --> 00:57:17.060
<v Michael Kennedy>doom, which was also ASM JS.

00:57:17.940 --> 00:57:21.800
<v Michael Kennedy>If we can do that, we could get something that would run, that would read old Web

00:57:22.000 --> 00:57:24.540
<v Michael Kennedy>Assembly into new WebAssembly if it really mattered to the world.

00:57:24.860 --> 00:57:25.180
<v Michael Kennedy>Absolutely.

00:57:25.800 --> 00:57:25.980
<v Michael Kennedy>Yeah.

00:57:26.240 --> 00:57:30.380
<v David Flood>Especially if it's in a public repo that people who care about the data can,

00:57:30.680 --> 00:57:31.560
<v David Flood>can rescue it somehow.

00:57:31.980 --> 00:57:32.080
<v Michael Kennedy>Yeah.

00:57:32.420 --> 00:57:34.040
<v Michael Kennedy>What about like a virtual machine?

00:57:34.500 --> 00:57:35.140
<v Michael Kennedy>You know, I agree.

00:57:35.220 --> 00:57:35.640
<v Michael Kennedy>Yeah, absolutely.

00:57:36.440 --> 00:57:42.220
<v Michael Kennedy>Could have saved me some, take a snapshot of Ubuntu LTS, some version,

00:57:42.420 --> 00:57:43.600
<v Michael Kennedy>and just what are we going to do?

00:57:44.200 --> 00:57:46.000
<v David Flood>Everything we do is Dockerized.

00:57:46.400 --> 00:57:47.320
<v David Flood>Everything is in a container.

00:57:47.780 --> 00:57:51.900
<v David Flood>So in the worst case scenario, we could give somebody the image, and they could run it if

00:57:51.910 --> 00:57:52.420
<v David Flood>they have Docker.

00:57:53.310 --> 00:57:57.780
<v David Flood>I think that's a nice peace of mind to know that no matter what, something will be able

00:57:57.790 --> 00:57:59.040
<v David Flood>to run this container.

00:57:59.440 --> 00:58:03.000
<v David Flood>And even in, I don't know if you've used GitHub, what is it called, Codespaces.

00:58:05.319 --> 00:58:06.680
<v David Flood>I archived one project.

00:58:07.570 --> 00:58:12.740
<v David Flood>It was kind of dramatic and sudden that it needed to be archived, so without much time

00:58:12.850 --> 00:58:13.320
<v David Flood>to do anything.

00:58:13.500 --> 00:58:15.460
<v David Flood>And it was a Ruby on Rails project.

00:58:15.680 --> 00:58:18.220
<v David Flood>And I'm not a Rails developer, but I

00:58:18.260 --> 00:58:19.600
<v David Flood>was able to get it archived in a way

00:58:19.780 --> 00:58:22.620
<v David Flood>that anybody could, with one command,

00:58:23.300 --> 00:58:27.040
<v David Flood>go to the repo on GitHub and boot it up in Codespaces

00:58:27.440 --> 00:58:30.540
<v David Flood>and then have it live running from their Codespace.

00:58:30.540 --> 00:58:31.800
<v Michael Kennedy>And so that works too.

00:58:32.040 --> 00:58:32.380
<v Michael Kennedy>Very cool.

00:58:32.600 --> 00:58:35.120
<v Michael Kennedy>I think as WebAssembly grows, there'll

00:58:35.120 --> 00:58:38.200
<v Michael Kennedy>be more possibilities for these types of things.

00:58:38.600 --> 00:58:39.300
<v Michael Kennedy>Yeah, amazing.

00:58:39.660 --> 00:58:42.640
<v Michael Kennedy>I'm pretty excited about PageFind having a Python API.

00:58:42.900 --> 00:58:46.440
<v Michael Kennedy>didn't realize that. So I'm going to be doing something with that for sure. So what else?

00:58:46.960 --> 00:58:51.180
<v Michael Kennedy>Let me ask you one more thing before I kind of let you wrap up with some final thoughts here.

00:58:51.620 --> 00:58:58.300
<v David Flood>What about AI? Oh, that's a good question. So AI, I mean, there's like, in my story,

00:58:58.660 --> 00:59:04.580
<v David Flood>there's like one interesting part of AI, which is that I got started and self-learned everything I

00:59:04.660 --> 00:59:10.840
<v David Flood>needed to about software development to begin doing this right before ChatGPT really came on

00:59:10.860 --> 00:59:17.240
<v Michael Kennedy>was able to do real programming yeah you're like four years of legit programming before right so i

00:59:17.380 --> 00:59:21.320
<v David Flood>think i mean so i was thinking i was thinking when i was thinking about how i got into it i thought

00:59:21.660 --> 00:59:28.500
<v David Flood>what if i was four years later starting my phd and wanting to do these tools um i would have been

00:59:28.570 --> 00:59:34.580
<v David Flood>able to accomplish what i needed to for my research without acquiring the technical skills and that

00:59:34.610 --> 00:59:38.140
<v Michael Kennedy>would have been that's a good thing i'm not sure if that's good about it it could be both i would

00:59:37.980 --> 00:59:43.220
<v David Flood>would have thought it was a good thing. I would have thought it's a good thing. But in my hands

00:59:43.740 --> 00:59:52.220
<v David Flood>now, like a software engineer, AI is more powerful in my hands now than it would have been then.

00:59:52.610 --> 00:59:57.560
<v David Flood>So I can make it work for me. Yeah, I can make it work for me in a way that I couldn't have been

00:59:57.610 --> 01:00:01.980
<v David Flood>able to then. So I'm thankful for that, but it's something I think of. I don't want to say it's

01:00:02.800 --> 01:00:07.940
<v David Flood>necessarily a bad thing, but it definitely marks a difference, a difference in time between other

01:00:07.960 --> 01:00:13.120
<v David Flood>people who are maybe wanting to get into digital humanities, they're humanities researchers. They

01:00:13.140 --> 01:00:17.740
<v David Flood>want to add some digital tools. You know, I think this will kind of, this will probably knock people

01:00:18.040 --> 01:00:22.280
<v David Flood>off of the more technical path because it's not needed. I think it will too. And I think that that

01:00:22.460 --> 01:00:27.640
<v Michael Kennedy>might be a negative. When you were telling me your story originally, I was thinking kind of like,

01:00:27.760 --> 01:00:32.740
<v Michael Kennedy>how neat is it that you didn't sign up for, and the people you're working with probably didn't

01:00:32.760 --> 01:00:36.300
<v Michael Kennedy>intend to sign you up for learning true software development.

01:00:36.820 --> 01:00:41.000
<v Michael Kennedy>But look at this cool and interesting job that you now have that you never

01:00:41.160 --> 01:00:41.880
<v Michael Kennedy>would have imagined.

01:00:42.000 --> 01:00:44.400
<v Michael Kennedy>I'm sure when you signed up for your PhD, you're like, you know what I'm

01:00:44.400 --> 01:00:47.320
<v Michael Kennedy>going to do when I get my PhD, I'm going to go X, Y, like, I'm going to

01:00:47.400 --> 01:00:48.020
<v Michael Kennedy>join the Darth program.

01:00:48.120 --> 01:00:49.780
<v Michael Kennedy>Like, no, probably not.

01:00:49.900 --> 01:00:50.000
<v Michael Kennedy>Right.

01:00:50.120 --> 01:00:50.760
<v Michael Kennedy>But here you are.

01:00:51.380 --> 01:00:54.880
<v Michael Kennedy>And I think that's actually a really interesting knock on effect for a lot

01:00:54.960 --> 01:00:59.040
<v Michael Kennedy>of researchers and people in grad schools, they're kind of put into this

01:00:59.660 --> 01:01:01.020
<v Michael Kennedy>programming adjacent type of thing.

01:01:01.400 --> 01:01:04.740
<v Michael Kennedy>You know, and a lot of folks sort of are like, actually, that's pretty interesting.

01:01:04.940 --> 01:01:06.160
<v Michael Kennedy>I'm going to kind of lean into that.

01:01:06.490 --> 01:01:10.300
<v Michael Kennedy>And I think AI might knock, like you said, knock people off that path to some degree.

01:01:11.100 --> 01:01:11.720
<v David Flood>Yeah, yeah, definitely.

01:01:12.210 --> 01:01:14.700
<v David Flood>So that's just like one part of the AI story.

01:01:15.050 --> 01:01:17.900
<v David Flood>The other one is that, like how we use it.

01:01:18.840 --> 01:01:25.540
<v David Flood>It's great for data extraction, pulling data out of different, you know, to make these

01:01:25.890 --> 01:01:30.100
<v David Flood>search interfaces more powerful, to extract different data from them.

01:01:30.540 --> 01:01:33.000
<v David Flood>That's just one example where it's been handy.

01:01:33.800 --> 01:01:38.180
<v David Flood>We're looking for ways that it can really empower faculty.

01:01:39.160 --> 01:01:47.460
<v David Flood>We're still very much in the exploration phase of how we can use it and provide it to faculty as a digital humanities tool.

01:01:48.220 --> 01:01:52.240
<v Michael Kennedy>Sure. I was thinking pretty much when I asked the question of it, it's just like two parts.

01:01:52.400 --> 01:01:56.300
<v Michael Kennedy>One, how is it? Are you guys using it to help take projects?

01:01:56.440 --> 01:01:58.320
<v Michael Kennedy>Well, that would have been a month. No, actually, it's three days.

01:01:58.820 --> 01:01:59.260
<v Michael Kennedy>You know what I mean?

01:02:00.300 --> 01:02:05.840
<v Michael Kennedy>that. And then if people are asking, you know, a professor comes along and says, and we want our

01:02:05.930 --> 01:02:12.880
<v Michael Kennedy>own custom AI thing, or we're using Harvard's internal one that we're allowed to use, but we

01:02:13.040 --> 01:02:17.600
<v David Flood>won't be able to use it once the grant runs out. You know what I mean? Yeah. Yeah. I think one,

01:02:17.820 --> 01:02:23.280
<v David Flood>one good example of this type of thing is that what we're starting to get is faculty who are

01:02:23.780 --> 01:02:28.180
<v David Flood>vibe coding and now, and we are going to teach them. We're going to teach them how to do it.

01:02:28.540 --> 01:02:30.780
<v David Flood>You know, instead of having them.

01:02:31.200 --> 01:02:32.500
<v David Flood>Yeah, it's absolutely a skill.

01:02:32.900 --> 01:02:33.500
<v David Flood>Yeah, no, it is.

01:02:33.720 --> 01:02:34.040
<v David Flood>It is.

01:02:34.800 --> 01:02:43.200
<v David Flood>Instead of copy and pasting from ChatGPT into VS Code, having them learn Copilot, maybe even having them download Cursor.

01:02:43.600 --> 01:02:48.320
<v David Flood>Download some real dedicated tools to get this done to make them more productive.

01:02:48.780 --> 01:02:52.860
<v David Flood>So, yeah, educating about how to do it is one thing.

01:02:53.200 --> 01:02:54.240
<v David Flood>You asked if we're using it.

01:02:54.900 --> 01:02:58.000
<v David Flood>We have access to Copilot.

01:02:58.980 --> 01:03:04.140
<v David Flood>and that's great. I can't say that we've shipped anything in three days instead of a month yet,

01:03:04.780 --> 01:03:13.440
<v David Flood>but one anecdote is that right now I'm doing some really interesting processing of music audio files,

01:03:13.940 --> 01:03:19.500
<v David Flood>and somebody asked to have a beatboxer if I could chop that file up so that all of the individual

01:03:19.820 --> 01:03:26.440
<v David Flood>sounds that the beatboxer makes are identified in a file. And so I'm using some music libraries,

01:03:26.840 --> 01:03:32.000
<v David Flood>Python library called Librosa. There's some complicated math in there. It's a little bit

01:03:32.040 --> 01:03:36.160
<v David Flood>too much for me. It's no problem for Claude. Claude knows how to do that math. And then,

01:03:36.720 --> 01:03:39.580
<v David Flood>and I use my expertise to string it together to get a good output.

01:03:39.940 --> 01:03:44.500
<v Michael Kennedy>Yeah. Awesome. You got time for one more quick question before we'll clap things up.

01:03:44.500 --> 01:03:44.660
<v Michael Kennedy>For sure.

01:03:45.300 --> 01:03:51.160
<v Michael Kennedy>Raymond out there, Raymond Yees asks, it says, it'd be good to hear how Harvard uses containers on AWS

01:03:51.840 --> 01:03:56.060
<v Michael Kennedy>and its reliability. It's reliable, not cheapest way to host things. Are you thinking about moving

01:03:56.380 --> 01:04:02.480
<v Michael Kennedy>moving that or is it not that much? Okay, I'll tell you about a failed experiment.

01:04:03.520 --> 01:04:11.180
<v David Flood>We were using ECS and we're still using ECS. So that's AWS's main, you know, it's not Kubernetes,

01:04:11.560 --> 01:04:17.840
<v David Flood>but it's one step down with their horizontal scaling container clusters. And I wanted to move

01:04:17.840 --> 01:04:23.580
<v David Flood>us onto a single EC2 instance because our projects are popular, but they're not so popular that we

01:04:23.500 --> 01:04:25.580
<v David Flood>actually have to worry about horizontal scaling.

01:04:25.860 --> 01:04:26.120
<v Michael Kennedy>Right.

01:04:26.220 --> 01:04:29.760
<v Michael Kennedy>It's not like it's front page in New York Times.

01:04:30.280 --> 01:04:31.300
<v Michael Kennedy>I guess it probably could be.

01:04:31.460 --> 01:04:34.300
<v Michael Kennedy>But even so, for the static sites, they probably still can take it.

01:04:35.300 --> 01:04:35.380
<v David Flood>Yeah.

01:04:35.640 --> 01:04:42.180
<v David Flood>So I priced it out and I got an example deployed, an example project deployed, and was able

01:04:42.180 --> 01:04:44.860
<v David Flood>to confirm that it would indeed be much cheaper.

01:04:45.940 --> 01:04:48.780
<v David Flood>And it was deployed in a similar way using AWS CDK.

01:04:49.020 --> 01:04:51.540
<v David Flood>So it's all infrastructure is code all the way down.

01:04:52.080 --> 01:04:54.680
<v David Flood>But it turns out there's all kinds of compliance.

01:04:54.970 --> 01:04:58.300
<v David Flood>When you are in charge of the VM at like a big university,

01:04:58.630 --> 01:05:00.580
<v David Flood>or I'm sure any corporate setting,

01:05:00.980 --> 01:05:03.920
<v David Flood>if you are in charge of the VM and the OS on it,

01:05:04.220 --> 01:05:07.260
<v David Flood>then you have to know that you have the latest patches in.

01:05:07.460 --> 01:05:08.920
<v David Flood>You have to know that you have latest Ubuntu.

01:05:09.490 --> 01:05:10.960
<v David Flood>And then there's other things,

01:05:12.460 --> 01:05:13.860
<v David Flood>different observability things

01:05:13.860 --> 01:05:14.740
<v David Flood>that you have to have in place

01:05:15.900 --> 01:05:17.600
<v David Flood>that are not usually required

01:05:17.880 --> 01:05:20.700
<v David Flood>if you're running in a container cluster like ECS.

01:05:21.480 --> 01:05:27.700
<v David Flood>So it ends up being a lot less work and much easier to achieve compliance if we run containers

01:05:28.120 --> 01:05:31.120
<v David Flood>or some other serverless thing.

01:05:31.440 --> 01:05:37.160
<v David Flood>If I run all my personal projects, they all run in a single virtual machine, but we're

01:05:37.280 --> 01:05:37.800
<v David Flood>running in containers.

01:05:38.340 --> 01:05:38.560
<v Michael Kennedy>Yeah.

01:05:38.560 --> 01:05:38.660
<v Michael Kennedy>Yeah.

01:05:39.300 --> 01:05:42.260
<v Michael Kennedy>And you've got all the SOC 2 stuff and all those different things, right?

01:05:42.320 --> 01:05:43.380
<v Michael Kennedy>Like there's layers.

01:05:43.940 --> 01:05:44.440
<v Michael Kennedy>Yeah, that's right.

01:05:44.740 --> 01:05:44.800
<v David Flood>Yeah.

01:05:44.920 --> 01:05:50.300
<v David Flood>I mean, I'll mention that, but what I didn't say is that in that 2019, when I started learning

01:05:50.520 --> 01:05:55.520
<v David Flood>Python. I discovered Talk Python almost immediately. And one of the first episodes that I listened to

01:05:55.520 --> 01:06:01.060
<v Michael Kennedy>was the other digital humanities. Cornelius Van Litt. He was an awesome guest.

01:06:01.260 --> 01:06:06.220
<v David Flood>That's right. Yeah. And I thought that was great. And that was also a bit about manuscripts,

01:06:06.820 --> 01:06:11.760
<v David Flood>a little bit more on the image side than the text side. And I didn't understand everything

01:06:11.790 --> 01:06:15.880
<v David Flood>that everybody was saying, but I just, I kept tuning in. And I think because of that,

01:06:16.120 --> 01:06:21.660
<v David Flood>Because Talk Python was like this, you know, I've been remote working for most of my time.

01:06:22.400 --> 01:06:27.000
<v David Flood>And Talk Python has been kind of like that conversation with the open source community

01:06:27.700 --> 01:06:28.920
<v David Flood>that's been always in my ear.

01:06:28.920 --> 01:06:33.340
<v David Flood>And I think that made, you know, a difference, making me feel like I understood the software

01:06:34.060 --> 01:06:37.420
<v David Flood>landscape and like the developer culture and what was going on.

01:06:37.640 --> 01:06:40.900
<v David Flood>And then the different Python libraries and what was possible.

01:06:41.640 --> 01:06:47.280
<v David Flood>So to people who are interested in taking things in a more technical direction, I think

01:06:47.280 --> 01:06:52.560
<v David Flood>it's helpful just to find a few things like that, that give you an insight into that world.

01:06:53.020 --> 01:06:59.060
<v David Flood>And the more you listen to it, the more you start to hear the same acronyms and the same

01:06:59.360 --> 01:07:02.640
<v David Flood>things said enough that you start to feel like, okay, now you're part of the club.

01:07:03.000 --> 01:07:04.360
<v Michael Kennedy>I really appreciate that.

01:07:05.180 --> 01:07:05.580
<v Michael Kennedy>That's cool.

01:07:06.080 --> 01:07:09.780
<v Michael Kennedy>I've certainly had people reach out to me and say things that at first didn't make any

01:07:09.940 --> 01:07:10.240
<v Michael Kennedy>sense to me.

01:07:10.360 --> 01:07:12.200
<v Michael Kennedy>Like I've been listening for six weeks now

01:07:12.400 --> 01:07:14.540
<v Michael Kennedy>and it's starting to make sense what you're talking about.

01:07:14.540 --> 01:07:15.980
<v Michael Kennedy>Like, why have you been listening for six months

01:07:16.030 --> 01:07:16.800
<v Michael Kennedy>when it made no sense?

01:07:16.940 --> 01:07:17.420
<v Michael Kennedy>That's insane.

01:07:17.680 --> 01:07:20.880
<v Michael Kennedy>But a lot of people use listening to the podcast,

01:07:21.070 --> 01:07:24.500
<v Michael Kennedy>is it mine and others, as language immersion, right?

01:07:24.640 --> 01:07:28.380
<v Michael Kennedy>Like I could get Duolingo and I could learn Portuguese

01:07:28.720 --> 01:07:30.580
<v Michael Kennedy>or I could move to Brazil for a month.

01:07:30.830 --> 01:07:31.380
<v Michael Kennedy>You know what I mean?

01:07:31.580 --> 01:07:32.200
<v Michael Kennedy>And then I would really learn.

01:07:32.200 --> 01:07:32.480
<v Michael Kennedy>- Yeah, exactly.

01:07:33.160 --> 01:07:33.460
<v Michael Kennedy>- Right.

01:07:34.000 --> 01:07:34.140
<v David Flood>- Exactly.

01:07:34.270 --> 01:07:36.040
<v David Flood>No, I think there's truth to that.

01:07:36.260 --> 01:07:38.660
<v David Flood>And some of the things I did was, you know,

01:07:38.820 --> 01:07:42.920
<v David Flood>search through, like search the word deployment, because I'm trying to get my head around how to

01:07:43.020 --> 01:07:47.000
<v David Flood>deploy for the first time. And I just want to hear people talk about it. Like I could read about it.

01:07:47.000 --> 01:07:52.120
<v David Flood>I could read the tutorial, but I just want to hear people talk about deployment to get a sense of what

01:07:52.300 --> 01:07:56.480
<v Michael Kennedy>actual deployment sounds like. There's something really different when you're learning or trying,

01:07:57.240 --> 01:08:01.380
<v Michael Kennedy>even you're maybe an experienced programmer, but not in this particular area to hear a human

01:08:01.840 --> 01:08:08.500
<v Michael Kennedy>side of it, not just the docs, not a sterile. These are the four steps, but like, I love it.

01:08:08.700 --> 01:08:10.080
<v Michael Kennedy>I mean, it's probably why I created the show.

01:08:10.280 --> 01:08:11.680
<v Michael Kennedy>It's because I didn't hear those stories.

01:08:11.780 --> 01:08:12.940
<v Michael Kennedy>We got to tell those stories.

01:08:13.440 --> 01:08:13.540
<v Michael Kennedy>Awesome.

01:08:13.860 --> 01:08:14.660
<v Michael Kennedy>I appreciate that.

01:08:14.860 --> 01:08:15.620
<v Michael Kennedy>So super cool.

01:08:15.840 --> 01:08:16.020
<v Michael Kennedy>All right.

01:08:16.359 --> 01:08:21.080
<v Michael Kennedy>So if other people are listening, maybe one of your pieces of advice is keep listening.

01:08:21.580 --> 01:08:22.299
<v Michael Kennedy>You'll get there.

01:08:22.480 --> 01:08:22.859
<v David Flood>Yeah.

01:08:22.960 --> 01:08:30.060
<v David Flood>And if anybody is in the humanities and somehow found their way onto this episode with no technical experience,

01:08:30.819 --> 01:08:37.060
<v David Flood>I just would give the caution of, like, you know, the anecdote that if AI coding had been

01:08:37.259 --> 01:08:42.940
<v David Flood>around the way it is now when I was learning, I wouldn't be doing digital humanities at

01:08:43.060 --> 01:08:43.200
<v David Flood>Harvard.

01:08:43.540 --> 01:08:45.600
<v David Flood>I wouldn't have been able to get into this field.

01:08:46.420 --> 01:08:47.420
<v David Flood>I wouldn't have known about it.

01:08:47.799 --> 01:08:52.380
<v David Flood>So I guess just think about that when you're learning and applying new tools.

01:08:52.720 --> 01:08:54.980
<v Michael Kennedy>I don't really know what the right fix for that is.

01:08:55.060 --> 01:08:56.299
<v Michael Kennedy>That's a very challenging problem.

01:08:56.500 --> 01:08:59.560
<v Michael Kennedy>I mean, you can say I'm just literally not going to fire it up.

01:08:59.720 --> 01:09:03.279
<v Michael Kennedy>But I mean, we used to hunt through Stack Overflow and the web and over and over.

01:09:03.460 --> 01:09:06.859
<v Michael Kennedy>And if you're really stuck or you really don't understand, like they're good at explaining

01:09:06.960 --> 01:09:07.319
<v Michael Kennedy>stuff too.

01:09:07.359 --> 01:09:12.200
<v Michael Kennedy>You just got to really stay in a learner's mindset, not just press the easy button and

01:09:12.319 --> 01:09:13.259
<v Michael Kennedy>make this thing and move on.

01:09:13.700 --> 01:09:14.380
<v Michael Kennedy>Easier said than done.

01:09:14.680 --> 01:09:15.359
<v Michael Kennedy>Easier said than done.

01:09:15.620 --> 01:09:22.000
<v Michael Kennedy>So yeah, I want to leave this with kind of a thought about how much things like Python

01:09:22.220 --> 01:09:27.260
<v Michael Kennedy>and these tools and technology can really empower stuff that you wouldn't think is even

01:09:27.279 --> 01:09:34.620
<v Michael Kennedy>related, like understanding old manuscripts and how painting is connected or changed over time and

01:09:34.799 --> 01:09:39.720
<v Michael Kennedy>stuff, right? Those sound very much disjointed from tech and software, but they really are

01:09:40.080 --> 01:09:45.319
<v Michael Kennedy>superpowers that you can bring to your work, whatever your industry is. I know our field of

01:09:45.460 --> 01:09:49.600
<v Michael Kennedy>study, I know there's some sociologists out in the audience and I'm sure others as well.

01:09:50.279 --> 01:09:54.700
<v Michael Kennedy>All right. Final thoughts, David, close it out. You said it great. I mean, you know,

01:09:55.340 --> 01:10:01.840
<v David Flood>Just applying these technical tools to old questions, that is the core of digital humanities.

01:10:02.220 --> 01:10:04.900
<v Michael Kennedy>When I first started hearing about this, I thought, I really don't know how this ties

01:10:05.060 --> 01:10:05.160
<v Michael Kennedy>together.

01:10:05.400 --> 01:10:08.780
<v Michael Kennedy>And after seeing it a few times, I definitely see the power of it.

01:10:08.780 --> 01:10:11.000
<v Michael Kennedy>And I thank you for your time coming on.

01:10:11.260 --> 01:10:16.760
<v Michael Kennedy>Thank you for sharing your look and the look inside of your team and inside of a small piece

01:10:16.940 --> 01:10:17.260
<v Michael Kennedy>of Harvard.

01:10:17.780 --> 01:10:22.960
<v Michael Kennedy>I really like these kinds of episodes because it's hard to see this from the outside, right?

01:10:23.060 --> 01:10:24.880
<v Michael Kennedy>like you just see the results,

01:10:24.950 --> 01:10:27.180
<v Michael Kennedy>but you don't see like the inner workings of the team

01:10:27.320 --> 01:10:28.140
<v Michael Kennedy>and the motivation and stuff.

01:10:28.360 --> 01:10:30.640
<v Michael Kennedy>So thank you so much for being here.

01:10:31.150 --> 01:10:32.480
<v Michael Kennedy>And yeah, bye everyone.

01:10:33.980 --> 01:10:36.100
<v Michael Kennedy>This has been another episode of Talk Python To Me.

01:10:36.370 --> 01:10:37.200
<v Michael Kennedy>Thank you to our sponsors.

01:10:37.390 --> 01:10:38.700
<v Michael Kennedy>Be sure to check out what they're offering.

01:10:38.940 --> 01:10:40.260
<v Michael Kennedy>It really helps support the show.

01:10:40.720 --> 01:10:42.100
<v Michael Kennedy>Take some stress out of your life.

01:10:42.480 --> 01:10:44.280
<v Michael Kennedy>Get notified immediately about errors

01:10:44.640 --> 01:10:46.440
<v Michael Kennedy>and performance issues in your web

01:10:46.450 --> 01:10:47.920
<v Michael Kennedy>or mobile applications with Sentry.

01:10:48.440 --> 01:10:51.300
<v Michael Kennedy>Just visit talkpython.fm/sentry

01:10:51.800 --> 01:10:52.860
<v Michael Kennedy>and get started for free.

01:10:53.280 --> 01:10:55.800
<v Michael Kennedy>Be sure to use our code, talkpython26.

01:10:56.760 --> 01:11:00.140
<v Michael Kennedy>That's Talk Python, the numbers two, six, all one word.

01:11:00.820 --> 01:11:02.920
<v Michael Kennedy>This episode is brought to you by CommandBook,

01:11:03.240 --> 01:11:05.320
<v Michael Kennedy>a native macOS app that I built

01:11:05.480 --> 01:11:08.040
<v Michael Kennedy>that gives long-running terminal commands a permanent home.

01:11:08.440 --> 01:11:10.440
<v Michael Kennedy>No more juggling six terminal tabs every morning.

01:11:10.880 --> 01:11:12.280
<v Michael Kennedy>Carefully craft a command once,

01:11:12.440 --> 01:11:14.020
<v Michael Kennedy>run it forever with auto-restart,

01:11:14.160 --> 01:11:15.700
<v Michael Kennedy>URL detection, and a full CLI.

01:11:16.060 --> 01:11:19.180
<v Michael Kennedy>Download it for free at talkpython.fm/commandbook app.

01:11:19.920 --> 01:11:21.800
<v Michael Kennedy>If you or your team needs to learn Python,

01:11:22.040 --> 01:11:32.080
<v Michael Kennedy>We have over 270 hours of beginner and advanced courses on topics ranging from complete beginners to async code, Flask, Django, HTML, and even LLMs.

01:11:32.400 --> 01:11:34.580
<v Michael Kennedy>Best of all, there's no subscription in sight.

01:11:35.240 --> 01:11:36.900
<v Michael Kennedy>Browse the catalog at talkpython.fm.

01:11:37.600 --> 01:11:42.260
<v Michael Kennedy>And if you're not already subscribed to the show on your favorite podcast player, what are you waiting for?

01:11:42.900 --> 01:11:44.700
<v Michael Kennedy>Just search for Python in your podcast player.

01:11:44.790 --> 01:11:45.680
<v Michael Kennedy>We should be right at the top.

01:11:46.100 --> 01:11:48.940
<v Michael Kennedy>If you enjoy that geeky rap song, you can download the full track.

01:11:49.070 --> 01:11:50.980
<v Michael Kennedy>The link is actually in your podcast blur show notes.

01:11:51.760 --> 01:11:53.140
<v Michael Kennedy>This is your host, Michael Kennedy.

01:11:53.560 --> 01:11:54.600
<v Michael Kennedy>Thank you so much for listening.

01:11:54.830 --> 01:11:55.620
<v Michael Kennedy>I really appreciate it.

01:11:56.040 --> 01:11:56.760
<v Michael Kennedy>I'll see you next time.

01:12:08.400 --> 01:12:11.200
I'm out.