#461: Python in Neuroscience and Academic Labs Transcript
00:00 Do you use Python in an academic setting?
00:01 Maybe you run a research lab or teach courses using Python.
00:05 Maybe you're even a student using Python.
00:08 Whichever it is, you'll find a ton of great advice in this episode.
00:11 I talked with Keelan Cooper about how he's using Python in his neuroscience lab at the
00:16 University of California, Irvine.
00:18 And Keelan wanted me to let you know that if any developers who are not themselves scientists
00:22 are interested in learning more about scientific research and ways you might be able to contribute,
00:27 please don't hesitate to reach out to him.
00:30 This is Talk Python to Me, episode 461, recorded March 14th, 2024.
00:35 Are you ready for your host?
00:37 There he is.
00:38 You're listening to Michael Kennedy on Talk Python to Me.
00:42 Live from Portland, Oregon, and this segment was made with Python.
00:45 Welcome to Talk Python to Me, a weekly podcast on Python.
00:53 This is your host, Michael Kennedy.
00:55 Follow me on Mastodon, where I'm @mkennedy, and follow the podcast using
00:59 at Talk Python, both on fosstodon.org.
01:02 Keep up with the show and listen to over seven years of past episodes at talkpython.fm.
01:08 We've started streaming most of our episodes live on YouTube.
01:11 Subscribe to our YouTube channel over at talkpython.fm/youtube to get notified about upcoming
01:17 shows and be part of that episode.
01:19 This episode is sponsored by Neo4j.
01:22 It's time to stop asking relational databases to do more than they were made for and simplify
01:27 complex data models with graphs.
01:30 Check out the sample FastAPI project and see what Neo4j, a native graph database, can do for you.
01:37 Find out more at talkpython.fm/Neo4j.
01:41 And it's brought to you by Posit Connect from the makers of Shiny.
01:45 Publish, share, and deploy all of your data projects that you're creating using Python.
01:50 Streamlit, Dash, Shiny, Bokeh, FastAPI, Flask, Quarto, Reports, Dashboards, and APIs.
01:56 Posit Connect supports all of them.
01:59 Try Posit Connect for free by going to talkpython.fm/Posit, P-O-S-I-T.
02:04 Hello, how are you?
02:06 I'm doing well.
02:07 So awesome to have you here on Talk Python and talking academics.
02:12 I didn't tell you before we hit record, but I spent a long time at universities and I just love them.
02:18 They're such cool places and it's going to be really fun to get a look inside how Python's
02:23 being used there.
02:23 Yeah, well, thank you so much for having me.
02:25 And yes, I too love universities.
02:27 It's kind of like all the coolest parts of humanity just kind of intermixing in one place.
02:33 So yeah, I'd love to kind of peel back the curtain on how things are going by.
02:37 Yeah, yeah.
02:37 Well, we're talking about how you and your colleagues use Python and data science inside of your
02:43 neurology research lab.
02:45 But before we dive into that, let's just get a bit of background on yourself.
02:49 Who are you?
02:49 How do you get into Python?
02:51 All those things.
02:52 So I'm Keelan Cooper.
02:52 I'm a neuroscientist at the University of California, Irvine.
02:56 So Southern California, 15 minutes from the beach and an hour from the mountains.
03:00 But I'm originally from the middle of nowhere, Indiana.
03:04 And I started playing with computers and code when I was young.
03:08 So like middle school ish, just ripping apart computers and seeing what was in them and then
03:14 trying to put them back together and feeling bad when they didn't work right after.
03:18 And then the typical, you know, tweaking the software when you don't like what it does
03:22 until you make it work.
03:23 And then probably my senior year of high school is when I started teaching myself Python.
03:28 And it was because we had to do some for some government class, actually.
03:32 Oh, wow.
03:32 Okay.
03:33 And we had to learn.
03:34 We were learning about the stock market.
03:35 And every day you'd have to spend like 15 minutes going to like some stock website and like filling
03:40 out your fake stocks.
03:42 And so I wrote a really small Python script that would just pull the data from the website
03:46 and populate like an Excel spreadsheet.
03:49 And so every day the kids in the class were just like going through and like spending 15,
03:53 20 minutes by handwriting it down.
03:55 And I would just sit there.
03:56 That's awesome.
03:56 And so that was kind of the first time I was like, wow, this whole automation thing is pretty
04:00 sweet from there.
04:02 I thought I just kind of caught the bug pretty early.
04:04 Python was definitely the way to go.
04:06 Was Python your first programming language?
04:09 First programming language was the Windows registry and trying to undo all of the mistakes
04:14 of the operating system.
04:15 It's been a while since I've been in the Windows registry, but good old reg edit.
04:20 I switched to Linux pretty quick.
04:21 Linux and Unix.
04:23 Are you still on Linux?
04:23 Mostly.
04:24 My desktops are all Linux.
04:25 My servers are obviously all Linux.
04:27 I like Mac for a laptop just because, you know, Linux has this thing where you tinker
04:32 with it.
04:32 And so then any small task you want to do, you end up like rewriting some deep script in
04:37 the operating system.
04:38 And like two hours later, you're like, what was that small thing I was trying to do again?
04:41 Yeah, exactly.
04:42 I got distracted.
04:43 I was rewriting something in there and off we go.
04:45 Yeah.
04:46 Yeah.
04:46 So Macs are nice because you still have all the same Unix like properties that are great,
04:51 but you pay a price for reliability.
04:53 You just sell a bit of your soul out, but boy, is that UI nice and those little menu
04:58 bar apps are handy.
04:59 You know, I was...
05:00 It's a little.
05:00 That's right.
05:02 I was...
05:02 I've been playing with running Ubuntu on my Mac M2 Pro, which it runs great, but it's
05:10 an ARM version of Mac, of Linux rather.
05:13 Well, both really.
05:14 But boy, is there a limited supply of applications for an ARM Linux distribution.
05:21 Linux.
05:22 Let me tell you, they're like, just download the Debian package.
05:25 I imagine that'll change pretty quick though.
05:27 Yeah.
05:27 They're like, just download the Debian package.
05:28 You just install it.
05:29 Like wrong platform.
05:30 Like again, over half.
05:33 But yeah, I think it will change as I think ARM's going to start to take over Windows a little
05:37 bit as well.
05:38 And obviously the Mac world is basically transitioning.
05:41 So anyone who has a Mac.
05:43 Yeah.
05:43 I think it's Qualcomm that just kind of started hinting that they were going to try and really
05:48 heavily compete with the M line of processors and have some pretty good specs.
05:52 So it'll be good.
05:53 I have an M3 and it's pretty nice.
05:56 It is really nice.
05:57 Like I said, I like to run more stuff on it, but it's still kind of Intel or x86 stuff
06:02 for Linux and Windows.
06:04 So it's a little hard to work with that, but still super fun.
06:07 That's a long way to say it's a very long time since I've been in a reg edit.
06:11 Personally, it sounds like you as well being not naming Windows too much.
06:16 I'm so bad at Windows now, like when I'm helping people with Python or something else and they
06:20 show me their computer, there's always that like 10 minute learning curve of like, okay,
06:25 how do I, how do I do anything basic on this machine?
06:29 Or even like the keyboard shortcuts you get so accustomed to when people don't have any of
06:33 those little things that you're just like, how do I, how do I select everything?
06:37 And that is it.
06:38 Like I did professional software development on Windows for a long, long time.
06:41 I even wrote a bunch of Windows apps.
06:43 It was, it was great.
06:44 But going back and forth too quickly, that or Linux, like just the hotkeys, I just get
06:49 broken.
06:49 Not just on Windows.
06:51 Also, when I come back to Mac, like I'm just completely out of sorts.
06:54 So yeah, it's fun.
06:56 All right.
06:57 Well, let's talk academics and Python from a, probably a OS, mostly agnostic perspective.
07:05 But yeah, just, you know, give us a sense of the kind of research you do.
07:08 You know, what is your field?
07:09 What do you study?
07:10 Those kinds of things.
07:11 So people get a sense of like, why am I talking?
07:14 Where are you coming from as you talk about doing all these things?
07:16 The core of my work is pure neuroscience.
07:19 So, so basic science.
07:21 What we do mainly in the lab is we take really tiny wires.
07:25 So they're like a fifth of the size of the human hair.
07:28 And now we're using something called silicon probes, which are, they're manufactured the
07:31 same way that computer chips are manufactured on silicon wafers using photolithography.
07:37 Do you get a higher density that way?
07:38 Do you get like a bunch of little sensors on one point or something?
07:42 Okay.
07:42 So we used to build these little drives.
07:44 I used to have one here, but I got rid of it.
07:46 Little drives by hand.
07:47 So you would just feed the wires in with, with forceps.
07:51 And so you'd get maybe 64 or 128 at most, depending on how much time you want to sit there and
07:57 feed the wires in.
07:58 Yeah.
07:58 But now you can just get the manufactured.
07:59 You pay a lot more, but you get twice, three times the sites.
08:03 And the whole point is the more sites you have, the more neurons you can actually record
08:08 from in the brain.
08:09 Yeah.
08:10 You're not just saying this part of the brain lit up, but you can have a much better picture,
08:14 right?
08:14 The constituent part of your brain is the neuron.
08:17 And so of the millions and billions of neurons, depending on the species you're recording from,
08:22 we can record maybe a few hundred of them.
08:25 But that's usually sufficient to actually, in the specific region you study, and I can talk
08:30 about that more, to discern some sort of information from it.
08:33 And so really the data type we really care about is this tiny little electrical voltages
08:37 that tell you what different neurons in the brain are talking about.
08:41 And so you put the wires in, you record the conversations of a bunch of neurons.
08:46 And then particularly we're interested in two brain regions that are critical for memory,
08:51 learning, and decision-making.
08:53 And this is the hippocampus, which in humans is about the size of your pinky and a few
08:58 inches in from your ear.
08:59 And the prefrontal cortex, which most people know about right behind your forehead, important
09:03 for learning, decision-making, and all those sorts of things.
09:06 Yeah.
09:06 So that's the core of my work is I'm in the lab doing the actual data collection and building
09:11 equipment to actually do that.
09:13 But once you have all of that data and the data keeps growing, like most other fields, you
09:17 got to do a lot of pre-processing, which takes Python.
09:20 You got to do a lot of post-processing, which takes a lot of Python.
09:23 And also we do something called neural decoding.
09:26 So not only do we just like say descriptively, what are these neurons doing?
09:31 But we can go one step further and say, what actual information are these cells representing?
09:37 So in the brain, we can kind of say, this is kind of the fundamental kind of information
09:44 transfer and how information is manipulated in the brain and how it ships information from
09:49 the environment into memory and how it uses that to make a decision.
09:53 All of those kinds of things we can use through fancy modeling and statistics and more recently,
09:59 deep learning and those sorts of things.
10:00 We'll have to come back to deep learning later.
10:02 That'll be fun.
10:03 Given your background.
10:05 So for this hardware, do you write the software that actually talks directly to the hardware
10:11 or is there something that just records it and you grab some sort of custom file format and
10:16 run with it?
10:17 Yeah.
10:17 More recently, it kind of depends on the lab.
10:21 So as time goes on, there's more and more companies that you can just buy off the shelf and
10:26 recording platforms, mostly for the electrical engineering people.
10:30 It's kind of like an audio amplifier because you're recording at millivolts in the brain.
10:33 So you have to amplify it, write it to, if you're plugged in with a wire, write it to the
10:38 computer.
10:38 So all that takes software in various forms.
10:41 And then we do a lot of animal research.
10:44 So the tasks that the animals do are pretty much all automated.
10:49 But recently in the lab, we've kind of had this resurgence of developing kind of novel hardware
10:55 and a lot of automation of behavior.
10:56 So I've kind of rewritten most of our entire behavioral stacks, which is a lot of just some
11:03 microcontroller programming, which not a lot of that's in Python.
11:06 A lot of that's just kind of like C++ and those sorts of things.
11:09 But we have cameras all over.
11:11 So I wrote this kind of like camera server that streams all of the camera footage from
11:16 a bunch of automated boxes to some like central server that just collects all of that data.
11:20 So yeah, a lot of the behavioral stuff nowadays, we're just building in-house to collect all
11:26 of the behavior data.
11:27 The EFIS stuff is now, especially because we're doing something called wireless recording.
11:32 So instead of just having a wire plugged into the head, it just writes it to like an SD card
11:36 or Bluetooth.
11:37 That's just kind of all on chip.
11:39 So it's just whatever the microcontroller language of the chip needs.
11:44 This portion of Talk Python to Me is brought to you by Neo4j.
11:49 Do you know Neo4j?
11:51 Neo4j is a native graph database.
11:54 And if the slowest part of your data access patterns involves computing relationships,
11:58 why not use a database that stores those relationships directly in the database, unlike your typical
12:05 relational one?
12:06 A graph database lets you model the data the way it looks in the real world, instead of forcing
12:10 it into rows and columns.
12:12 It's time to stop asking a relational database to do more than they were made for and simplify
12:18 complex data models with graphs.
12:20 If you haven't used a graph database before, you might be wondering.
12:24 About common use cases.
12:25 What's it for?
12:26 Here are just a few.
12:27 Detecting fraud.
12:28 Enhancing AI.
12:30 Managing supply chains.
12:32 Gaining a 360 degree view of your data.
12:35 And anywhere else you have highly connected data.
12:38 To use Neo4j from Python, it's a simple pip install Neo4j.
12:44 And to help you get started, their docs include a sample web app demonstrating how to use it
12:48 both from Flask and FastAPI.
12:51 Find it in their docs or search GitHub for Neo4j Movies Application Quick Start.
12:56 Developers are solving some of the world's biggest problems with graphs.
13:00 Now it's your turn.
13:01 Visit talkpython.fm/Neo4j to get started.
13:06 That's talkpython.fm/Neo4j.
13:09 And the letter J.
13:10 Thank you to Neo4j for supporting Talk Python to me.
13:15 I think it's surprising how much software and even hardware, but definitely software is involved
13:22 for something that doesn't sound like a software discipline.
13:25 Yeah.
13:25 You wouldn't think of what you guys are doing as inherently almost like a software team,
13:30 but there's a lot of software there.
13:31 Absolutely.
13:32 And it's growing.
13:32 So it used to be 10, 20 years ago, more biology, I'd say, like more wet lab stuff.
13:38 But 90% of what I do as kind of a neurobiologist is really just engineering style things.
13:45 Like I'm more recently designing PCBs and I'm in the shop a lot, just like with saws and
13:51 hammers and drills and like actually physically building things.
13:54 And obviously a lot of code.
13:55 And the coding part is becoming bigger and bigger to the point where in the field, I always say
14:00 that the neuroscience is like about three decades behind astrophysics.
14:05 Because all the problems that like neuroscientists, like say we're facing now as a field, they
14:09 have like three decades prior where in astrophysics, they're like, well, or neuroscience, we're
14:14 like, what do we do with all this data?
14:15 This is, I mean, I'm collecting a hundred gigabytes an hour, if not more, like what do we do?
14:21 That is a lot.
14:21 Yeah.
14:22 But relative to like some of those big telescopes that are collecting like almost petabyte scale.
14:28 I would say both ends of physics, like the very, very extreme ends of physics.
14:32 So astrophysics, the very large and then particle physics, right?
14:36 At CERN as well, they've got insane amounts of data.
14:39 Yeah.
14:39 And that's what we're starting to see.
14:40 I think in neuroscience too, is that kind of division of like, because the scale of data
14:45 collection is so big, you're starting to need not just a single lab, but teams.
14:49 So we have a few institutes now that are just pumping out terabytes of data.
14:53 And so you start to see that division between the neuroscientists who are really in the lab,
14:59 hands-on with actual neural tissue or the recording device, and the neuroscientists who
15:04 are just take the data and analyze it and develop new models and statistical models.
15:09 And also theory, there's always a dearth of theory in neuroscience, but the computational
15:14 modeling is certainly getting a lot bigger within the last few decades as well, where people's
15:20 entire job is just how do we model some of these things in code?
15:24 You probably run into different research groups, different teams that have different levels
15:29 of sophistication from a software side.
15:32 And do you see like a productivity or a quality difference jump out from like the kind of work
15:37 or the velocity of work that people are doing there?
15:39 Absolutely.
15:40 It makes me almost like a, it's a huge range.
15:44 There are like very sophisticated labs.
15:46 And usually those are the labs that have kind of just a pure software person on the team or
15:51 people who are very inclined towards software all the way to, and it makes me so sad when people
15:56 are spending like weeks in a spreadsheet, just manually doing things by hand.
16:00 Yeah, I know.
16:00 You could do this in five minutes and by a minute.
16:03 And not only could you do it faster, you could do it without any errors.
16:07 Yeah, more reliable.
16:08 None of those like, oh, I misread it and I shifted off by a cell or I typed in, I missed
16:15 something, right?
16:16 Because it just reads what's there.
16:17 Yeah.
16:18 A lot of like graduate programs are starting to wake up to this fact that it's going to
16:23 be almost impossible to do any science without some degree of proficiency in coding.
16:28 And I think a lot of, a lot of say grad students and postdocs and so on, when they actually sit
16:33 down and try and analyze their data, whether they're just in Excel or they need to write
16:38 a little Python script, that's kind of their first introduction is, oh, I have this data.
16:42 I need to do something with it.
16:43 I'm going to Google exactly how do I read in this data or how do I do a t-test in Python
16:49 or how do I plot something in matplotlib?
16:51 And that's kind of the level that they start getting into out of necessity.
16:54 But the sophistication and the speed, because that's, they're usually just teaching themselves.
16:59 That's most of academia.
17:01 It's just, you have a problem, spend a few days Googling and reading books until you find
17:06 it.
17:06 And once it works, you can kind of just leave it.
17:08 You don't have to clean it up or anything, right?
17:10 Yeah.
17:11 Okay.
17:11 Which results in a lot of, I mean, the progress of science doesn't go away, but the code is,
17:17 you know, not robust.
17:18 And so that's why you see things, especially in other fields of like psychology and such
17:22 like replication crises.
17:23 And people have done meta analysis of running the same software stack on like 12 different
17:29 data sets and you get different results.
17:31 And so you start to kind of see that shaky foundation is starting to bleed into the, like
17:37 you said, the reliability results.
17:39 And starting to have consequences, not just it's more work or something.
17:42 Exactly.
17:43 Maybe we could start a bit by just talking about maybe the history, you know, diving into this
17:48 a little bit more, just the history of programming in neuroscience.
17:51 I wasn't in neuroscience in any way, but I worked with a bunch of cognitive scientists studying,
17:58 you know, how people solve problems and thought about things at a lab for one of my first jobs.
18:03 And we studied all through eye tracking.
18:04 Yeah.
18:05 Not the iPhone, but actual eyes.
18:07 It was fascinating.
18:08 It was tons of data.
18:08 It was really cool.
18:09 And there were like, you described a lot of people who would do sort of Excel stuff and
18:13 they would take the data and they process it.
18:14 And over time, we just started to automate these things.
18:17 And their first thought was, you're programming me out of a job.
18:20 I'm like, no, no, no.
18:22 This is the crappy part of your job.
18:24 Like you're supposed to analyze the results and think about it and plan new stuff.
18:27 And now you can just focus on that.
18:29 Right.
18:30 And as the software got better, you know, we just tackled bigger problems.
18:34 So, you know, maybe give us a bit of a history of on your side.
18:37 So I love the cognitive science.
18:39 That's my more background is cognitive science.
18:42 I was my undergrad and grew up in science in a cognitive science department while also
18:47 doing some wet lab neuroscience stuff.
18:49 So it's fun.
18:50 Yeah, absolutely.
18:51 Did you start out with like MATLAB and that kind of stuff?
18:54 Is that where they told you you need to be?
18:56 Neuroscience has certainly had at least our branch of neuroscience just because by the nature
19:02 of recording voltages and you need to write to a computer.
19:05 So there has been kind of a long history of for as long as there's been even like punch
19:11 card computers.
19:11 People have kind of read in the data into the computer and done, you know, their statistics
19:17 on that rather than something else.
19:19 I'm just I'm actually recently writing a kind of a review article on kind of the history
19:24 of data science and neuroscience.
19:25 And I loved this paper.
19:28 It was from 1938.
19:30 And they took an EEG spectrum.
19:33 And so EEG is just the continuous time series of brain voltage.
19:38 So you're not in the brain recording.
19:39 And this is, I think, from humans.
19:41 And they took something called the Fourier transform, which I'll be as up to speed with that is you
19:45 basically just take some oscillating signal and you break it down into its constituent parts.
19:51 And most of you have seen it before.
19:52 If you've ever seen like an audio spectrogram, that's kind of the most notable visualization
19:58 where you can kind of see the high frequencies and the low frequencies.
20:00 Basically, it pulls the frequencies out of the signal, right?
20:04 Yeah.
20:04 But the way they did this, this is 1938.
20:07 There's no computers.
20:07 So they actually had mechanical device where they would just take this EEG trace that was
20:13 on tape and they would feed it into this like mechanical machine.
20:17 And it would basically read kind of this black line on the tape.
20:20 And so as it would crank the tape around this machine, depending on kind of the frequency that
20:27 the line went up and down, that would read out the Fourier transform.
20:30 So it was mechanical, like a lot of those cool devices back in the older days.
20:35 That's impressive.
20:36 Now you can get that same thing with in MATLAB.
20:39 You just type FFT, parentheses, parentheses, put your data in the middle and you get the
20:43 same thing in microseconds.
20:45 But neuroscience, at least my field, has kind of always had this kind of serendipitous relationship
20:51 with computing generally, coding generally.
20:53 And a lot of the code, I think earlier on was kind of was Fortran-ish and then it moved
20:58 towards MATLAB and MATLAB's kind of had its stake in the ground for a long time.
21:03 Just because that was the first kind of software that you could really do array manipulations
21:08 on well.
21:08 And it was kind of a higher level than some of the lower level programming.
21:11 So a lot of the older labs have their entire code base and stack and analysis software in
21:18 MATLAB.
21:19 And so it's only been within maybe five to six, seven years, it'll be a bit longer,
21:24 10 years that you've really seen Python start to supplant MATLAB as kind of the de facto
21:29 programming language in labs, just because of the cost of trying to transfer everything over.
21:35 And despite the fact that MATLAB isn't open source and it's extremely expensive, most universities
21:40 have licenses.
21:41 And so that kind of facilitates...
21:43 It's prepaid.
21:44 Yeah.
21:44 In a sense.
21:45 Yeah.
21:45 But it is still pretty expensive.
21:47 Especially if you get those little toolboxes, like wavelet decomposition toolbox, 2,000 bucks
21:52 instead of a pip install, you know?
21:54 And again, we do a lot of signal processing.
21:55 And so that's exactly the place you want to be.
21:58 And like MATLAB usually controls, because it has pretty good control over like external hardware,
22:03 you can run like your behavior, your task kind of in MATLAB codes.
22:07 You can kind of do everything in one language as you would like to do in Python.
22:11 But it's starting to kind of go away.
22:14 And I think a lot of that is just because the allure of Python, which is so many tools,
22:19 and because it's probably a lot easier to learn for most people than MATLAB.
22:23 We're kind of starting to see that switch now that there's kind of more to offer, I'd say,
22:28 a lot of scientists than MATLAB.
22:31 Yeah, you said 10, 12 years ago.
22:32 At least in our film.
22:33 Yeah, the difference in the external packages on PyPI you can get, and especially the ones for data science have just exploded.
22:43 The choices and the stuff that's out there, it's pretty diverse.
22:46 It's pretty crazy.
22:47 The only other one that I think is still in pretty strong competition with Python
22:50 from the perspective of, we collaborate a lot with like mathematicians and statisticians,
22:55 and R is their usual favorite, just because statistics, like all the best statistical packages are still pretty much in R.
23:03 And so that's where a lot of people live.
23:05 TG Plot's pretty good.
23:07 Pretty mixed plots.
23:08 Yeah, that's interesting.
23:10 It's really focused, and it's really good at what it does.
23:13 And one of the things that I think is worth just considering, if somebody comes, let's say a brand new first year grad student comes into the lab,
23:22 and you're like, all right, what's your programming experience?
23:24 Like, well, program the clock on the VCR.
23:27 Like, okay, we're going to have to start you somewhere or something, right?
23:30 And you could teach them something really specific, like MATLAB, or something along those lines.
23:36 But if they learn something like Julia, not like Julia, like Python, not Julia, maybe even not really R, but R is closer somewhat,
23:45 is they learn not just a skill for the lab, but it's kind of almost any software job is potentially within reach,
23:53 with a little bit of learning about that area, right?
23:55 Like, if you know Python, you say, I want a job, there's a massive set of options out there.
23:59 If you say, I know MATLAB, or even Julia, it's like, okay, well, here's the few labs and the few research areas and the real,
24:09 I just think it's something that-
24:11 A lot of engineering firms.
24:12 Yeah, yeah.
24:13 I'm just thinking that, like, a lot of academic folks should consider what happens if the student doesn't necessarily become a professor.
24:20 You know what I mean?
24:21 Which actually is a lot of the time, right?
24:23 Or a professional researcher of some sort.
24:25 And that's a really awesome skill to have on top of your degree.
24:29 So I think that's just a big win for it.
24:31 I'm happy to see that.
24:32 And that literal situation just happened where a statistician that we were collaborating pretty closely with graduated,
24:38 brilliant guy, and got a job at Microsoft.
24:42 And so we were in a meeting after he was there, and they were like, what's some advice that you have now that you've been, you know, in industry for a while?
24:50 And he's like, stop using R.
24:51 Learn Python, because everyone here uses Python.
24:54 And it took me a few months to kind of switch from the R worldview of, you know,
25:00 caret hyphen to equal sign to actually to work and collaborate with everyone, because everyone's just using Python.
25:07 Yeah.
25:07 From his perspective, and I'm sure it's not unique.
25:10 No, I'm sure that it's not either.
25:12 I think, yeah, I just think it's, it's not like a religious war.
25:15 It's not like, oh, I think Python is absolutely better.
25:17 You should just not use other stuff.
25:18 I just think it's preparing people for stuff beyond school.
25:22 It's a pretty interesting angle to take.
25:24 And it's not like you can't learn other things.
25:25 I think it's really good to learn other things, especially ones that are complementary, where R can be complementary, especially now that they have a lot of the, like, subsystem packages.
25:35 Or when I get R code, I usually just write, like, a subprocess.
25:39 I did this recently, just wrote, like, a subprocess line to call the R script, because I was too lazy to rewrite it.
25:45 Yeah, sure.
25:45 But there's other, like, I don't know, like, Rust is probably a good one to probably try and branch out.
25:50 Or, like, lower level languages, like C++.
25:52 Yeah.
25:52 If you need for what you're doing.
25:54 Yeah, and it sounds like you guys do for talking to hardware and stuff like that.
25:57 Yeah, occasionally.
25:58 Boy, there's not a lot of things Python can't do.
26:01 Break out some MicroPython when you got to get your microcontrollers and stuff.
26:04 This portion of Talk Python to Me is brought to you by Posit, the makers of Shiny, formerly RStudio, and especially Shiny for Python.
26:14 Let me ask you a question.
26:15 Are you building awesome things?
26:17 Of course you are.
26:18 You're a developer or a data scientist.
26:20 That's what we do.
26:21 And you should check out Posit Connect.
26:23 Posit Connect is a way for you to publish, share, and deploy all the data products that you're building using Python.
26:30 People ask me the same question all the time.
26:33 Michael, I have some cool data science project or notebook that I built.
26:37 How do I share it with my users, stakeholders, teammates?
26:39 Do I need to learn FastAPI or Flask or maybe Vue or React.js?
26:45 Hold on now.
26:45 Those are cool technologies, and I'm sure you'd benefit from them, but maybe stay focused on the data project.
26:50 Let Posit Connect handle that side of things.
26:53 With Posit Connect, you can rapidly and securely deploy the things you build in Python.
26:57 Streamlit, Dash, Shiny, Bokeh, FastAPI, Flask, Quarto, ports, dashboards, and APIs.
27:04 Posit Connect supports all of them.
27:06 And Posit Connect comes with all the bells and whistles to satisfy IT and other enterprise requirements.
27:12 Make deployment the easiest step in your workflow with Posit Connect.
27:16 For a limited time, you can try Posit Connect for free for three months by going to talkpython.fm/posit.
27:22 That's talkpython.fm/P-O-S-I-T.
27:26 The link is in your podcast player show notes.
27:28 Thank you to the team at Posit for supporting Talk Python.
27:31 So another thing that's, I don't know how it's received.
27:36 I know it took a while to kind of really catch on.
27:39 And I think the thing that just broke the final barriers for open source being adopted, at least in business, was the AI stuff and the data science stuff.
27:49 People were like, oh, we can't use this open source stuff.
27:51 We've got to have a SLA and some company we can sue if our code doesn't work right or, you know, whatever, right?
27:56 Something crazy like that.
27:57 And they're like, but you understand all the AI and all the data science.
28:02 We have to use this open source stuff.
28:03 Like, all right, fine.
28:04 What's the open source story for you guys?
28:07 Academia is probably championed open source for a really long time just because, I mean, open source back, I mean, even when I first started, it was just if you read a paper and someone has some new fancy analysis.
28:20 Before it became a bigger push by, like, funding agencies to, like, actually post it to GitHub or some repository.
28:26 I mean, you could just email people and be like, hey, I saw your paper.
28:29 I want that script.
28:30 And they would just send you a MATLAB file.
28:32 And it would be, you know, just whatever they had written.
28:35 But it was in MATLAB and you'd have to kind of tear it apart yourself.
28:38 And there was little to no documentation.
28:39 You'd be lucky if there's comments and it's, you know, spaghetti code.
28:43 But, you know, you figure that out.
28:45 You kind of work backwards and deconstruct it.
28:47 And eventually you kind of have their code.
28:49 So that kind of ethos of just, you know, scientists are really good, by and large, of just sharing information and helping people out.
28:57 If you have a question, just ask.
28:58 It's kind of always been there.
29:01 At least in our field, it's not as competitive as some other ones where you're just kind of like racing to get the next project out.
29:08 It happens, but rarely.
29:09 But now a lot of funding agencies and just in general, people are just excited about when you publish a paper, you put a GitHub link in the bottom of the paper.
29:18 And then that links to the repository.
29:20 And, yeah, maybe it's not been updated in a while, but the code's there and you can just take it and grab it.
29:25 For the reproducibility.
29:26 How about using other things?
29:28 Is there SciPy?
29:29 I know for astronomy, there's AstroPy.
29:31 Is there Neuropy?
29:33 It's really still more analysis-dependent and pre-process-dependent.
29:38 So there's kind of this, it's still the early days where there's probably too many formats just because no one can agree on what's the best one.
29:46 So like even a lot of the data formats are written kind of in Python to take whatever data you have and reformat it to something shareable.
29:53 And there's five or six of them floating around.
29:55 There's probably two that are still duking it out to see which one will be the best.
29:59 And probably five years from now, there's going to be a better one.
30:02 So data formats, certainly there's kind of this, there's a few that are neck and neck.
30:07 Analysis pipelines, a lot of those are still done in-house, but they're starting to be a lot more toolkits and frameworks and packages.
30:13 There's some really good ones that have more documentation written.
30:17 They're on the PyPy repositories.
30:19 So you can just pip install them and you have them.
30:21 The computational neuroscience people are great at this.
30:24 So all the like neural simulation software, that is all really well documented, really well written.
30:30 A lot of good like example code and tutorials and so on.
30:34 So yeah, we're starting to see kind of this more robust kind of ecosystem where you can just kind of pull things.
30:41 It still just kind of varies.
30:42 There's probably still not one go-to place other than the standard data science toolkits.
30:48 Right, right.
30:49 The pandas and so on.
30:51 Yeah, NumPy, Matplotlib, pandas, scikit-learn, if you're doing deep learning, PyTorch or TensorFlow, all of those still apply to any data site stack.
31:00 Yeah, of course.
31:00 What's your day-to-day stack look like if you're sitting down to do some analysis?
31:05 I have a VS Code and like autocomplete where I just write import in and it just NumPy, Matplotlib, pandas.
31:13 Then I usually delete pandas because unless I have a CSV file, I'm not using it.
31:17 So NumPy, Matplotlib, I can probably do 75% of the things I want to do.
31:22 Scikit-learn and SciPy, obviously, if I'm doing any stats with those things, those libraries I might go to.
31:28 And then over the last few years, I kind of just have my own, just because you catch yourself writing the same functions over and over and over again.
31:36 And so I just started building my kind of internal framework of just things I know I need.
31:40 So if I'm working with LFP data, I have all my filters there.
31:44 If I have spike data, I have all my spikes there.
31:46 We do a lot of decoding, so developing deep learning algorithms to decode neural data.
31:52 All of those are kind of listed there.
31:54 Yeah, and then I started realizing, dude, internal tools make the difference between solving a problem in 10 minutes or solving it in an hour, where I can just sit down and have everything automated to come up.
32:06 So yeah, the standard data science stack I use pretty frequently.
32:09 Hardware stack, I mean, so VS Code, I just recently switched to just because everyone was talking about it from Sublime.
32:17 I usually just edit it in a terminal.
32:19 And I was like, oh, I'll try it out.
32:21 Everyone's talking about it.
32:21 It's one of the good things Microsoft has done.
32:24 It's pretty sweet.
32:25 Yeah, it is pretty sweet.
32:26 That's pretty nice.
32:27 And when you do the VS Code stuff, are you just writing straight Python scripts or are you doing VS Code on top of notebooks?
32:34 Yeah.
32:34 Because it has that kind of text view of a notebook type of thing, I think.
32:38 I used to use exclusively Python scripts, so just the .py.
32:42 Started seeing how great Jupyter was.
32:44 And so then you start doing everything in Jupyter, and then you start to have all these convoluted notebooks and notebook V1 through 7.
32:51 So then you realize that you've got to find a balance between, you know, notebooks are great for presentation and for quickly testing.
32:58 But the sooner you can get it into, like, a class structure or a package or something.
33:02 The sort of productized version of it, you wanted to get it down into Python code a lot of times, probably.
33:08 Exactly.
33:08 Like your internal tools you talked about.
33:10 You're like, all right, this is the library you just call stuff, right?
33:12 That just belongs more as a package and not a...
33:15 The sooner you can kind of condense, pull the code out of the notebook and just leave notebooks for presentation, it's probably the best.
33:22 Yeah.
33:23 It's a lot of pipelines.
33:24 So it's a lot of preprocessing pipelines.
33:25 So you don't want, you know, 50 cells of just moving data.
33:29 Preparing, yeah.
33:30 You talked about having quite a bit of data, some of that being image-based.
33:34 Sounds like a lot of work.
33:36 Do you have, like, a big compute cluster?
33:37 Do you just have, like, an ultra, an M3 ultra or whatever?
33:42 That's not even not yet.
33:43 It's M2s.
33:44 But do you have just a big machine or do you guys do cloud stuff?
33:47 What's compute look like?
33:48 Depends on what I'm doing, what I need.
33:50 Most things, let's be honest, I could probably just use my desktop computers, really a super micro server.
33:56 It just runs Linux.
33:57 But then for doing deep learning, you need GPUs.
34:00 And so we have a nice set of GPUs we can pull, too.
34:04 Yeah.
34:05 Do you do your own training on LLMs and other deep learning things?
34:08 The only LLM training I did was just for fun.
34:10 But it's all the deep learning stuff we have to train.
34:13 And it's on R, particularly.
34:15 So that's all GPUs for that.
34:17 A lot of the statistics we do are, like, permutations.
34:20 So you need to kind of, like, parallelize them out into CPUs.
34:24 So then I'll pull, like, a CPU cluster we have if I need it.
34:27 Do you see Irvine have, like, a big compute resource sort of thing you can grab?
34:31 They have a campus-wide one that you can get on.
34:33 And there's a few independent ones that I have access to.
34:37 So GPU is kind of in a different place than CPUs are.
34:40 So I can just kind of pick and choose.
34:42 And then I have a few servers in the lab that I've just kind of put together.
34:46 All my camera stuff, all my behavior, I kind of wrote it so it's cloud-based.
34:51 So I can kind of just pull up my phone and look at the videos of what animals are doing and stuff.
34:55 All that runs on just a server in the lab.
34:58 So, yeah, it's the compute is there when you need it.
35:01 And I think as I've matured, I've kind of learned when to use what compute when and when it's worth taking the extra time to use it when you don't need to.
35:09 And also when a lot of the times you don't even need it.
35:12 So I know a lot of people when I see their code and they complain that it takes like an hour to run.
35:17 I mean, just using multiprocessing in Python, that in and of itself is enough to not need to use a cluster.
35:25 They're just using a single thread for their analysis.
35:28 For sure.
35:28 Or just a bad programming patterns.
35:30 design patterns.
35:31 Like you're looping over the thing in the Pandas data frame instead of doing vector operations in the Pandas data frame.
35:38 Like that kind of stuff, right?
35:39 A hundred percent.
35:40 Yeah.
35:41 I mean, that was one of the first things that I usually teach.
35:44 I have this little example script that shows that like, why is it better to pre-allocate array rather than to just append to the bottom?
35:50 And it's like these things that we kind of take for granted now, but it's not intuitive unless you actually see it.
35:55 No, you learn it the hard way, but it sticks in your mind once you learn it.
36:00 Well, yeah, I think that's the issue.
36:01 It's like people just take it for granted.
36:03 Yeah, for sure.
36:03 Like, why didn't you know that?
36:05 How do you onboard new people?
36:08 It just kind of depends on the lab.
36:14 Every lab kind of has their own structure of just kind of this hierarchy of expertise where like I started as an undergrad and I just volunteered in a lab at a different university and just volunteered my time.
36:26 Eventually could get paid.
36:28 Just wanted to spend time in the lab and you could all the way up to grad students who are there to get a PhD and have more kind of autonomy over their projects.
36:35 A postdoc who has a PhD and five to eight years of experience.
36:41 And so it can pretty work well.
36:42 Then there's like staff scientists or even a lot of labs now are hiring just pure engineers or pure software people.
36:48 Okay.
36:48 Because there's such a need for that.
36:50 And so, yeah, it really just depends on the lab specific situation and what their focus is on and what they need.
36:56 Cool.
36:56 I guess if you have a good NSF grant and you got some extra money, it might be money well spent to hire some student who has good programming skills, right?
37:03 Absolutely.
37:04 Yeah.
37:05 You talked about the video stuff that you're doing, the streaming video.
37:08 Do you actually do analysis on that or is it just for you to go back and look at?
37:12 Oh, yeah.
37:13 Yeah.
37:13 We do analysis on that.
37:14 There's actually a pretty cool deep learning package out now.
37:17 We didn't write it.
37:18 Another lab did where you just give it the video frame and it can automatically segment kind of it's an animal.
37:24 So, like where their paws are or where their like nose is looking or in some cases people have like you're talking about eye tracking.
37:30 They do eye tracking in like mice now.
37:32 They do eye tracking on mice.
37:34 That was hard on humans in the 90s.
37:36 Yeah.
37:36 There's a lot of VR.
37:37 So, they put kind of mice in this like VR system.
37:40 Oh, wow.
37:41 And they can like see where their little mouse pupil is looking on like the VR screen.
37:45 Yeah.
37:45 And do they show them different scenarios and they can detect that and they react to it?
37:49 Oh, absolutely.
37:49 Incredible.
37:50 So, yeah.
37:50 And a lot of stuff, at least in our field, the Nobel Prize was awarded for you stick some electrodes in the hippocampus part of your brain that's important for learning and memory.
38:00 And then you have the animal kind of run around some environment.
38:03 And then you take some video data of kind of where they were running the environment.
38:07 And if you were only looking at the brain data, you can predict to like 90 some percent accuracy the location of the animal.
38:15 So, you can show this kind of this correspondence that inside the brain is a map of kind of the environment.
38:20 Our stuff, we're taking that a little one step further that says this map is not just for space.
38:25 It's for non-spatial and other things too.
38:28 There's this kind of network of information in the brain that the animal can kind of like navigate through, even if they're just standing still but thinking about some sort of problem.
38:36 But we use video data to validate what the animal's doing or check what kind of tasks they're doing and so on.
38:42 So, yeah, a lot of multimodal heterogeneous data that each needs its own funky pre-processing.
38:48 And depending on the task at hand, you're writing something new to ask that question.
38:53 So, is that OpenCV?
38:54 Yeah, my stuff is OpenCV and streamed over sockets and some Django webpage.
39:00 It's fun.
39:01 It's cool to build.
39:02 That sounds really cool.
39:03 Yeah, absolutely.
39:04 So, what kind of questions are you answering with the video?
39:07 Yeah.
39:07 Or is it just to correlate back with the time series of what you're measuring in the brain?
39:11 So, like I said, the Nobel Prize was for this spatial map.
39:14 We're doing this non-spatial stuff.
39:16 And so, we kind of do both in the lab where we have an animal kind of run around and then we have an animal just kind of sit still
39:23 and do some sort of mental task.
39:25 In our case, it's a – they have to memorize a sequence of odors and if the sequence gets shuffled, they make a different choice.
39:31 And so, we're basically showing how does the brain work for the spatial part versus the non-spatial part?
39:37 What's similar about these two things?
39:39 What's different about these things?
39:40 And we show that the – one of our recent papers was that the brain uses a lot of the similar mechanisms to navigate space as it does to navigate this kind of non-spatial odor task.
39:50 But we also showed that there's this mechanism that the brain uses to take kind of discrete memories and link them together into some kind of mirror hole.
39:58 Best case being like this talk we've been talking back and forth for 30-some minutes now.
40:03 Inside, there's kind of chunks of the conversation.
40:05 So, if tomorrow someone was to ask you, what did you and that neuroscience guy talk about on your podcast?
40:10 You would kind of rattle off this story of, oh, we talked about history of Python and this and this and this, right?
40:17 This, this, and this are each kind of discrete memories that in your brain you kind of lock together and store them.
40:24 So, you could use them, make decisions about them and so on and so on.
40:28 Think about it as a whole, not just the little ideas, every little idea, but like just the big concept of it, right?
40:33 Exactly, yeah.
40:34 And it's a fundamental thing that the brain does.
40:37 Like people say humans are storytellers and your life is kind of the sequence of events of stories.
40:42 And so, you use that every single day.
40:44 And across a bunch of diseases, that's one of the first things to actually be impaired, whether it's addiction or schizophrenia or Alzheimer's.
40:52 That kind of ability to link things in time and link them well and make decisions about them starts to get impaired.
40:59 That's not great when that happens, but that is what happens, right?
41:01 Absolutely, yeah.
41:02 But as with the software engineering practices, for lack of a better word, that you would recommend that maybe other grad students, professors who are feeling like they're not, they don't have their software game fully together, pay attention to it.
41:15 And maybe what should they ignore, right?
41:17 Like, should they pay attention to like source control and get up?
41:19 Should they have unit tests or, you know, what should they pay attention to or not?
41:23 First off, no one writes tests.
41:24 That is just only the very few, very well put together.
41:28 And usually people who just came from industry write tests.
41:31 Sure.
41:31 Which is an issue.
41:32 But yeah, first off, just learn Python.
41:35 I've said that hundreds of times and I'm preaching to the choir in this audience.
41:39 You are, for sure.
41:41 Learn Python is honestly, there's not much better you could learn.
41:45 And two, it's, you know, it's quintessential automation stuff.
41:49 So it's really just think about the things that you're doing, the spreadsheets or, you know, the simple things.
41:54 And really just ask yourself, if you find yourself doing any repetitive tasks, that's a software problem.
41:59 Those are the things to kind of look at first.
42:01 So you have your text editor in one window, Google in the other, and stack overflow your way to learning.
42:06 And so the way people, I think, really do kind of teach themselves Python is probably the best way to learn.
42:12 But nevertheless, I think there is a real need for, and again, we're starting to see more of it, just formal education.
42:18 Even if it's just a course, our program is really great that they started to teach a Python course just because the students requested it because they knew how important it was.
42:28 Like a Python for neuroscience?
42:31 Exactly.
42:32 Okay.
42:32 Yeah.
42:33 I mean, you just work your way through if statements for loops, data types.
42:37 You probably also, I'm just guessing you get a chance to work with some of these external libraries that are relevant to studies they're doing, right?
42:44 Rather than, here's an example of stock market data.
42:46 You're like, great, not a trader.
42:47 I don't want to be a programmer.
42:49 Why am I here, you know?
42:50 Yeah, it's really relevant.
42:52 And I think it's just kind of seeing like, oh, yeah, I would do that this way, but this is so much easier if I use Python.
42:58 And I can use it in Python.
42:59 And just seeing that, oh, it's not as bad as the mountain looks a lot higher when you're at the base than the summit.
43:05 So it does.
43:06 Seeing it done once is usually enough to kind of tell people it's not as bad as you originally think.
43:11 Yeah.
43:12 I feel like maybe I saw it this way.
43:14 I certainly know a lot of other people see it this way.
43:16 I mean, when I was younger.
43:17 But a lot of people see it this way as well.
43:18 Is it like you got to be crazy smart to do programming.
43:22 It's really challenging.
43:24 It's kind of one of those things that only a few people can do.
43:27 And then you get into it and you're like, oh, it's not a few really huge steps and things you've got to solve.
43:33 It's like a thousand small steps.
43:35 And each one of the little small steps, you're like, that was actually easy.
43:38 That's no big deal.
43:39 What's the next step?
43:40 And you get to the end, you're like, where was the big step?
43:42 Where was it really hard, right?
43:43 Yeah, absolutely.
43:44 Yeah.
43:44 Do you have some experience where people in the department or people that worked with you are like, oh, I'm not a programmer.
43:51 I don't want to do this stuff.
43:51 Then they kind of got into it and really found out that programming was something they really liked.
43:56 Any converts out there?
43:58 Yeah, I would say so.
43:59 I mean, I think there's kind of two kinds of people.
44:01 There's people who program just because, what is it?
44:04 The programming is an art book or whatever.
44:06 They love it just for the sake of loving.
44:08 And I'm probably closer to those kind of people, right?
44:11 I just think it's the coolest thing, like academic.
44:14 But then there's the people who just kind of see it as like it's a tool like anything else.
44:18 And so you could be an expert in a drill or you could just know to pick up a drill.
44:23 That's kind of the majority of people is that it's just another tool in their toolkit, especially for a scientist, just to answer the question that you're trying to answer.
44:32 And I would even flip the reverse where there's been some times where I've maybe even used Python too much.
44:38 In the sense that like I made a problem more because it's like the automation dilemma, right?
44:43 It's like, do I spend an hour automating this when I could do it in 10 minutes?
44:46 You got to do it a lot of times and all of a sudden the hour is worth it.
44:49 But if it turns out you don't.
44:50 It's like I might need this a year from now.
44:52 So I might as well just write the script.
44:54 Whereas I could just do it in Excel.
44:56 That's pretty standard.
44:57 Standard problems we all run into in programming.
44:59 It's like, write some code to do that.
45:01 Or I could just be done with it for sure.
45:04 All right.
45:05 I think we've got time for a couple more topics.
45:07 One thing that I think might be fun to talk about is publishing papers, right?
45:12 Obviously, if you're in academics, especially in labs, you got to publish papers.
45:16 Do you use notebooks and stuff like that for your papers?
45:19 Or is that just kind of separate?
45:21 What's...
45:22 Yeah, like I said, notebooks are great for presentation.
45:24 So yeah, I use notebooks kind of for development just because you can quickly run code and go
45:31 back up and move things around.
45:32 And so I kind of like that ability to kind of just a stream of consciousness, write code
45:37 until you kind of like see how it's kind of working the prototype and then refactor out
45:41 into an actual .py document or a package or something.
45:44 So that's kind of been my workflow and it works pretty well.
45:47 But then when you actually have the code and it works and it's robust, you actually want
45:51 to put...
45:51 There's a lot of figures.
45:52 That's usually the main thing.
45:53 So you kind of put all of that.
45:55 Here's the data.
45:56 Here's the pre-processing.
45:57 Here's the figure one.
45:58 Here's figure two, figure three in the notebook just so it's reproducible and other people
46:03 can download it and rerun your code and that sort of thing.
46:06 So I think that's slowly becoming kind of the standard approach for those labs that use
46:11 Python and they share their code openly.
46:14 That's kind of how they do it.
46:15 Anything like executable books or any of those things that kind of produce printable?
46:20 Output out of the notebooks?
46:22 Like publishable output out of the notebooks?
46:24 Not a lot.
46:25 But there's one of the journals, it's called eLife.
46:28 It's kind of...
46:29 It's trying to like push the boundaries of what it means to publish a scientific paper.
46:33 And so they kind of have...
46:35 Because most papers are really just on the web nowadays.
46:38 The journals aren't really physical journals as much anymore.
46:41 They kind of have like papers as executable code where you can like plot the figure in
46:47 the browser and kind of run through the notebook and just as an experiment.
46:50 But it's pretty cool to kind of see these like new alternative ways to still convey the same
46:57 findings.
46:57 But you can play with it.
46:59 You can come to CA out.
47:00 So the methods are kind of implicit in the...
47:02 What's the name of the journal?
47:03 That's eLife.
47:04 eLife.
47:05 Okay, cool.
47:06 Yeah, you probably, I suppose, need some way to capture the data output because you
47:11 might not have access to the compute to recompute it.
47:14 You know, somehow it's got to sort of be a static version.
47:17 But that sounds really cool.
47:18 Yeah.
47:19 And especially for like most recently, some of our like trained models, it's becoming more
47:24 important to just share the weights and share those sorts of things too.
47:27 You can't just share the code to train the thing if people don't have the compute to actually
47:32 train them themselves.
47:33 It's kind of growing to not just sharing your data, not just sharing your code, but
47:37 you need to share like the key derivatives of the preprocessing and those sorts of things.
47:41 Or even just sharing the version numbers because there's been psychology or fMRI literature.
47:46 There's like a bug in some version that like made a lot of the results null.
47:50 And so, you know, one person could use version 3.7 of a package, but that had a bug, but people
47:56 don't know that.
47:56 So they claim it's not reproducible, but it's really just not the same algorithms.
48:00 Yeah, yeah, yeah.
48:01 Or like across languages, like if you rerun the same analysis in MATLAB versus Python versus
48:06 R, especially complex ones, there's a lot of little design decisions under the hood that
48:12 might tweak exactly how that regression fits or exactly how that, if you're statistically
48:17 sampling, how the sampling works under the hood of those sorts of things.
48:20 Awesome.
48:20 Are you familiar with the Journal of Open Source Software?
48:22 Yeah.
48:23 I had the folks on from there.
48:25 Yeah.
48:25 I had them on quite a while ago.
48:27 And I think they're trying to solve the interesting problem of if you take the time to create a
48:32 really nice package for your area, you might have not taken your time writing the paper.
48:37 And so you wouldn't get credit because you don't have as many papers to publish.
48:40 So they let you publish like your open source work there, which I think is pretty cool.
48:44 What do you think about that?
48:45 We kind of had that same problem.
48:46 One of the, I run a nonprofit called Continual AI.
48:51 It does artificial intelligence and outreach and research, and we have conferences and all
48:56 sorts of events.
48:57 But one of the main things we've done is we built a deep learning library on top of PyTorch
49:02 called Avalanche.
49:03 And so like we had a really great community of mostly volunteers who just saw the need
49:08 in the field and put it together.
49:10 But then again, it's like a lot of us are academics.
49:12 How do you present this?
49:13 And so you write wrapper papers around kind of the framework.
49:18 So that's kind of been the de facto way of like, it's not really a paper, but you still
49:23 need to like, you know, share it and get credit for it and put your name on it.
49:27 It's certainly an issue.
49:28 I'm starting to see it not even just with software, but even with hardware, because hardware
49:32 is becoming more open source in our field.
49:34 And so you just kind of write like a paper about the hardware solution to some problem.
49:39 That's cool.
49:39 It's better than a patent.
49:40 Yeah, it's definitely better than a patent.
49:42 Patents, while they serve a purpose, are pretty evil.
49:45 Let's wrap things up with maybe just, you know, you mentioned Continual AI.
49:50 Tell people a bit about that.
49:51 It's the largest nonprofit for continual learning.
49:54 Continual learning in a nutshell is, say I have a neural network and I train my neural network
49:59 to classify cats.
50:01 And I classify cats to 90% accuracy.
50:03 And we're like, yeah, this is why neural networks are great.
50:05 I take that same trained neural network on cats and I train it on say dogs.
50:09 It does really well.
50:10 90% accuracy on dogs.
50:12 We're really excited why neural networks are so great.
50:14 But the issue is if I take that, again, the same network, I just trained it on dogs and
50:18 previously trained it on cats.
50:19 I try and test it on cats again.
50:22 It's going to forget pretty much everything it learned about cats.
50:26 And this is an old, old problem.
50:27 Back in like, you know, the good old days when neural networks were connectionist models and
50:31 it was computer scientists, it was the cognitive scientists.
50:34 They noticed.
50:35 Overtraining or something like that, right?
50:37 Kind of overfitting.
50:38 It's neurorelated.
50:39 Yeah, it's very similar.
50:40 Catastrophe forgetting.
50:42 They call it the sequential learning problem, which is why I'm really interested in it because
50:45 I'm really interested in, you know, continual learning and sequential memory.
50:48 Or in neuroscience, it's called the stability plasticity problem.
50:51 So when do you learn?
50:52 When do you remember?
50:53 And so over the last, since we started the organization 2018, the field has kind of exploded
51:01 just because there's such a need for overcoming this across a lot of use cases.
51:06 So like a lot of times you can only see the data once.
51:09 So the way you solve the problem generally is you just shuffle in cats and dogs into the
51:14 same data set and retrain your model.
51:15 But now the neural networks are getting bigger and bigger and bigger.
51:18 Retraining is getting costlier and costlier.
51:20 You can't just have, you can't train a network on petabytes every time you want to update it.
51:24 That's even if you have access to the data and the storage to save it and so on and so forth.
51:29 So clever ways to solve the problem.
51:31 And so we're kind of around that.
51:33 We have neuroscientists, we have computer scientists, AI researchers across academia,
51:38 industry, all that are just a bunch of people really interested in this problem to just come
51:42 together and share papers, share ideas.
51:45 We just had a conference.
51:46 We sponsor a lot of competitions for people to kind of put forward an idea to some problem
51:52 that we kind of put out every year.
51:53 So it's been really, really exciting to kind of see the community grow over the years and
51:58 all the tools and fun things that's kind of come out.
52:00 Well, it's definitely a hot topic right now.
52:02 Absolutely.
52:02 The cognitive scientists and neuroscientists studied neural networks and stuff.
52:06 And it was kind of like, well, maybe this model stuff.
52:08 And then now we're in the time of LLMs and the world has gone crazy.
52:12 Absolutely.
52:13 I said we were going to close out with a continual AI, but let me just ask, what are your thoughts
52:18 on LLMs?
52:19 Where the stuff's going?
52:20 I mean, we all have exposure to it in different ways, but you've got this understanding of what
52:26 they're trying to model it on quite a bit.
52:28 So what do you guys in your space think of it?
52:31 So the first lab I joined was, well, the first real lab I joined was a neuroscience lab where
52:36 we were sticking wires in brains and actually doing real neuroscience.
52:39 But I also started kind of simultaneously working with a cognitive scientist where we were working
52:46 on the original word to them.
52:48 So this is, in my mind, like the grandfather of the LLM.
52:51 So this is the model that like overnight took Google Translate from being meh to pretty good.
52:56 And it's just really at the heart of it, just an autoencoder.
53:00 But we were really excited then doing kind of semantic modeling of how much further kind
53:06 of deep learning could take language modeling.
53:08 And then we were actually using it to study catastrophic forgetting.
53:11 So does word to that catastrophically forget?
53:13 And the punchline is it does.
53:15 And that kind of got me jumped into really excited about continual learning and so on.
53:19 So I saw kind of that trajectory then, and then kind of stepped out of it for a few years
53:25 dug more deep into, you know, pure neuroscience and artificial intelligence from other angles.
53:32 But I'd always been just so fascinated by this idea of like, say you take an AI and it could
53:37 read every book or it could read the internet.
53:39 Like, what would you be able to get?
53:41 And that was kind of, you know, in my mind, like, well, seeing Word2Vec try and do the same
53:46 thing and training it on my laptop, like, well, you know, it's going to take, it's going to take a
53:50 minute till we get there.
53:51 And I underestimated that drastically.
53:55 It was like, almost no progress, almost no progress.
53:57 Wow, what just happened?
53:58 Right?
53:59 It's a testament to statistical learning more than anything, which is just how much information
54:05 can you just soak up and put together in a fancy new way and regurgitate back.
54:10 I think the next big leap is going to be adding more cognition to that progress.
54:15 So adding when an agent has a goal, when an agent kind of has to break down a series of
54:21 steps to get to that goal, those kinds of things we don't see as much of.
54:25 And that will kind of be the next big push.
54:28 I think that'll kind of take all the cool things that LLMs can do now and kind of blow everyone
54:33 away of just, if you can take the internet and put it on a seven gigabyte file, what can
54:38 that get you?
54:39 But what if you can take the internet, put it on a seven gigabyte file, but actually have
54:42 some sort of logic and direction and the agent itself can actually navigate through its
54:48 own thoughts?
54:49 That's going to take us, I think, right to the borderline of-
54:53 Terminator?
54:53 Yeah.
54:53 Not Terminator.
54:54 I'm just kidding.
54:55 It won't be Terminator.
54:55 No.
54:56 It won't be Terminator.
54:57 But it'll be really an intelligent system.
54:59 It's going to be super interesting.
55:00 You know, you've got all this prompt engineering and these clever ways to kind of get the current
55:06 LLMs in the right mindset, which is probably a personification.
55:10 But you can tell it things like, here, I want you to tell me how to do this.
55:14 And I'll come up with some answer.
55:15 You can say, I want you to think step by step.
55:18 And then all of a sudden you get a real different type of answer where it pulls out the pieces
55:22 and it thinks about how it does it.
55:23 And it's going to be interesting.
55:24 You know, it's kind of like kind of stuff, but it just, it already knows how to think.
55:29 You don't have to give it little weird clues.
55:31 Like you're an expert in this and you're really good at it.
55:33 Now I want to ask you a question about, oh, I'm good at it.
55:35 Okay.
55:35 I'll answer better.
55:37 My favorite definition of AI is it's whatever computers can't do yet.
55:40 Yeah.
55:40 Because like, you know, 30 years ago, if we had this conversation, it'd be like, so what
55:44 do you think of Deep Blue?
55:45 Do you think Deep Blue, the AI that becomes probably chess, can, you know, think and is
55:51 it going to take all our jobs?
55:52 Or it's going to be, what can Watson do?
55:54 Can it think and take all our jobs?
55:56 And, you know, that was solved with tree search, which, you know, undergraduates in
56:01 their second year CS class are learning.
56:02 It's just a standard, it's search.
56:04 People don't even think of search as AI.
56:06 What if we just loaded every possible outcome into the chess thing and the steps and we just
56:11 try to take, you know, traverse each step and see where it takes us, right?
56:14 There's always going to be a room to grow, but, you know, from a cognitive science perspective,
56:18 I think something far more interesting rather, you know, I think it's cool to see what computers
56:22 can actually do.
56:23 And I think they can do a hell of a lot more.
56:25 But I think it's more interesting to kind of ask the more philosophical question of what
56:29 does that actually mean for us?
56:30 Because at every step of the way, when we develop something new in artificial intelligence, it
56:34 tells us something a lot deeper about our own intelligence too, where back in the 50s and
56:39 60s, they thought the chess meant intelligence.
56:41 And so now we're kind of seeing with LLMs, if someone just reads a bunch of books and can
56:46 memorize a bunch of books, does that mean they're intelligent?
56:49 Because that's effectively what an LLM can do.
56:51 It comes across as intelligent, right?
56:53 It comes across that way to people like, oh, you have all the answers, but it doesn't
56:57 mean you're good at problem solving necessarily.
56:59 We're kind of peeling at this onion and we're kind of segmenting intelligence into its different
57:03 categories to really kind of break it apart just from this vague word of intelligence into
57:10 actually what are the parts that make something intelligent?
57:12 What does it really mean?
57:14 And what are still the things that like we thought were hard and are really easy or the
57:18 things that we thought were easy and really hard?
57:20 Because any LLM you have now can't chew gum and walk to the store and buy you something
57:25 to eat and then play you in chess.
57:28 And so the general and general AI isn't just there for to make it sound grander.
57:34 It's there because that's actually something that makes humans very, very unique is that I can
57:38 have this conversation with you, write code, prepare an omelet, walk a dog and do all of these
57:44 things all at once.
57:45 And I started as a babe.
57:46 And have feelings and thoughts and introspection about it and all that.
57:50 Hopes, dreams, like ketchup and not tomatoes and all sorts of other.
57:53 Yeah.
57:54 Super interesting.
57:54 So are you, are you pretty positive on, on where stuff's going?
57:58 I'm less concerned about the AI itself and I'm more concerned about just how we react to it.
58:03 And if we act, react intelligent, we react well.
58:06 We went through the industrial revolution.
58:08 We've went through the steel age.
58:10 We're starting to go through the cognitive revolution or whatever people are going to call this a hundred
58:14 years ago.
58:15 I think by and large, people are going to be okay.
58:17 I think we just need to, you know, have good policies, make sure that people do it responsibly,
58:22 people do it well.
58:23 And that's going to be the hard part is just how do we manage the transition?
58:26 Well, how do we take a whole labor force whose jobs will be gone?
58:31 Like there's no economic incentive to keep their jobs and retrain them in a really smart and sensible
58:37 way.
58:37 So people's livelihoods don't just go away to, to maximize the dollar.
58:42 Those are kind of problems that are going to need good policy and clever solutions and a
58:48 lot of research to actually sit down and handle well.
58:51 But AI isn't going to go anywhere.
58:52 Just like computers haven't went anywhere, but it's not Terminator that we should worry about.
58:56 No, absolutely.
58:57 It's certainly not.
58:58 I was joking.
58:59 We should worry about disinformation and politics and news and all that stuff.
59:05 I think people get so hung up on all the negatives that we kind of forget.
59:08 Like we're developing these tools for a reason.
59:10 Scientists are spending so much time and we're excited about them for a reason.
59:13 One being that I used to work on proteins, trying to find the structure of proteins.
59:17 And that was one protein a year.
59:20 It would take one year to find the structure of protein.
59:22 And now I can run this.
59:23 I took the same sequence that we found and ran it through AlphaFold, which is DeepMind's
59:27 protein folding software.
59:29 And I could do it in five minutes.
59:30 I mean, those are the things that will really like catalyze science and technology and healthcare
59:35 and actually solve the problems we want to solve.
59:38 So that's what I think we should do.
59:40 I think I heard even some AI stuff coming up with some new battery chemistry, potentially.
59:44 Everything.
59:45 How do you appropriately kind of harness like a fusion ring and like a tokamak reactor or something
59:53 like that?
59:53 Everything.
59:54 Even to how do you route a PCB with like millions of traces affected?
01:00:00 So it's a tool for the mind and it's not going to put our minds out of work.
01:00:05 Like you said a long time ago when you were in your eye track.
01:00:07 Yeah, exactly.
01:00:07 It's like a different view of that thing.
01:00:09 The same reason that none of us want to sit around and churn butter and go till the fields
01:00:14 and walk to work.
01:00:16 I don't dislike my clothes washer.
01:00:19 Not at all.
01:00:20 Yeah.
01:00:21 Awesome.
01:00:21 All right.
01:00:22 Well, thanks for giving us a look inside your lab, inside your research and the field.
01:00:27 And then just, you know, share what you're about and up to.
01:00:29 It's been great.
01:00:29 Yeah, yeah, yeah.
01:00:30 Thanks for having me.
01:00:31 Yeah, it's been really great to, at least for the years that I've been in research, seeing
01:00:36 how pivotal Python has been.
01:00:38 And I think that's one of the big things that probably goes unnoticed by a lot of developers
01:00:43 when they're actually writing their code to solve whatever problem they're solving is
01:00:46 that someone who's wrote the NumPy library or the Scikit-learn library or, you know, Jupyter
01:00:52 Notebooks has actively played a part in curing diseases and making people's lives better.
01:00:57 And those stories usually don't kind of come to the forefront.
01:01:00 It's not a direct line, right?
01:01:01 But the research was done and facilitated by these open source tools and then discoveries
01:01:06 were made.
01:01:07 Like you said, that's the difference between people getting cured in 10 years or getting
01:01:10 cured in a year.
01:01:11 And that's thousands of lives.
01:01:13 And so you might not think about it when you're sitting behind your desk, you know,
01:01:17 making something usable.
01:01:18 But for every day a PhD student doesn't have to like pull their hair out and debug some software
01:01:23 when it's well-written and it works well.
01:01:24 That's a day that they can find something new.
01:01:27 That's awesome.
01:01:27 Yeah.
01:01:28 Very inspiring.
01:01:28 Let's leave it there, Keelan.
01:01:30 Thank you for being on the show.
01:01:31 Absolutely.
01:01:32 Yeah.
01:01:32 It's been great to talk to you.
01:01:33 Thank you so much.
01:01:34 Yeah.
01:01:34 Thanks for coming.
01:01:34 This has been another episode of Talk Python to Me.
01:01:38 Thank you to our sponsors.
01:01:40 Be sure to check out what they're offering.
01:01:41 It really helps support the show.
01:01:43 It's time to stop asking relational databases to do more than they were made for and simplify
01:01:49 complex data models with graphs.
01:01:51 Check out the sample FastAPI project and see what Neo4j, a native graph database, can do
01:01:58 for you.
01:01:58 Find out more at talkpython.fm/Neo4j.
01:02:03 This episode is sponsored by Posit Connect from the makers of Shiny.
01:02:07 Publish, share, and deploy all of your data projects that you're creating using Python.
01:02:11 Streamlit, Dash, Shiny, Bokeh, FastAPI, Flask, Quarto, Reports, Dashboards, and APIs.
01:02:18 Posit Connect supports all of them.
01:02:20 Try Posit Connect for free by going to talkpython.fm/Posit, P-O-S-I-T.
01:02:26 Want to level up your Python?
01:02:28 We have one of the largest catalogs of Python video courses over at Talk Python.
01:02:32 Our content ranges from true beginners to deeply advanced topics like memory and async.
01:02:37 And best of all, there's not a subscription in sight.
01:02:40 Check it out for yourself at training.talkpython.fm.
01:02:42 Be sure to subscribe to the show.
01:02:45 Open your favorite podcast app and search for Python.
01:02:47 We should be right at the top.
01:02:49 You can also find the iTunes feed at /itunes, the Google Play feed at /play,
01:02:54 and the direct RSS feed at /rss on talkpython.fm.
01:02:58 We're live streaming most of our recordings these days.
01:03:01 If you want to be part of the show and have your comments featured on the air,
01:03:04 be sure to subscribe to our YouTube channel at talkpython.fm/youtube.
01:03:10 This is your host, Michael Kennedy.
01:03:11 Thanks so much for listening.
01:03:12 I really appreciate it.
01:03:13 Now get out there and write some Python code.
01:03:15 Bye.
01:03:16 Bye.
01:03:17 Bye.
01:03:18 Bye.
01:03:19 Bye.
01:03:20 Bye.
01:03:21 Bye.
01:03:22 Bye.
01:03:23 Bye.
01:03:24 Bye.
01:03:25 Bye.
01:03:26 Bye.
01:03:27 Bye.
01:03:28 Bye.
01:03:29 Bye.
01:03:30 Bye.
01:03:31 Bye.
01:03:32 you Thank you.
01:03:35 Thank you.