#461: Python in Neuroscience and Academic Labs Transcript
00:00 Do you use Python in an academic setting?
00:02 Maybe you run a research lab or teach courses using Python.
00:05 Maybe you're even a student using Python.
00:08 Whichever it is, you'll find a ton of great advice in this episode.
00:11 I talked with Keiland Cooper about how he's using Python in his neuroscience lab at the
00:16 University of California, Irvine.
00:19 And Keelen wanted me to let you know that if any developers who are not themselves scientists
00:23 are interested in learning more about scientific research and ways you might be able to contribute,
00:28 please don't hesitate to reach out to him.
00:30 This is Talk Python to Me, episode 461, recorded March 14th, 2024.
00:35 Are you ready for your host, please?
00:39 You're listening to Michael Kennedy on Talk Python to Me.
00:42 Live from Portland, Oregon, and this segment was made with Python.
00:50 Welcome to Talk Python to Me, a weekly podcast on Python.
00:54 This is your host, Michael Kennedy.
00:55 Follow me on Mastodon, where I'm @mkennedy, and follow the podcast using @talkpython,
01:00 both on fosstodon.org.
01:03 Keep up with the show and listen to over seven years of past episodes at talkpython.fm.
01:08 We've started streaming most of our episodes live on YouTube.
01:12 Subscribe to our YouTube channel over at talkpython.fm/youtube to get notified about upcoming shows and be
01:18 part of that episode.
01:20 This episode is sponsored by Neo4j.
01:22 It's time to stop asking relational databases to do more than they were made for and simplify
01:28 complex data models with graphs.
01:31 Check out the sample FastAPI project and see what Neo4j, a native graph database, can do
01:36 for you.
01:37 Find out more at talkpython.fm/neo4j.
01:42 And it's brought to you by Posit Connect from the makers of Shiny.
01:46 Publish, share, and deploy all of your data projects that you're creating using Python.
01:50 Streamlit, Dash, Shiny, Bokeh, FastAPI, Flask, Quarto, Reports, Dashboards, and APIs.
01:57 Posit Connect supports all of them.
01:59 Try Posit Connect for free by going to talkpython.fm/posit.
02:03 P-O-S-I-T.
02:04 - Hello, how are you?
02:06 - I'm doing well.
02:07 So awesome to have you here on Talk Python and talking academics.
02:13 I didn't tell you before we hit record, but I spent a long time at universities and I
02:17 just love them.
02:18 They're such cool places and it's gonna be really fun to get a look inside how Python's
02:23 being used there.
02:24 - Yeah, well, yeah.
02:25 Thank you so much for having me.
02:26 And yes, I too love universities.
02:28 It's kind of like all the coolest parts of humanity just kind of intermixing in one place.
02:33 So yeah, I'd love to kind of peel back the curtain on how things are going.
02:37 - Yeah, yeah.
02:38 Well, we talked about how you and your colleagues use Python and data science inside of your
02:43 neurology research lab.
02:46 But before we dive into that, let's just get a bit of background on yourself.
02:49 Who are you?
02:50 How do you get into Python?
02:51 All those things.
02:52 - So I'm Keiland Cooper.
02:53 I'm a neuroscientist at the University of California, Irvine.
02:56 So Southern California, 15 minutes from the beach and an hour from the mountains.
03:01 But I'm originally from the middle of nowhere, Indiana.
03:05 And I started playing with computers and code when I was young.
03:09 So like middle school-ish, just ripping apart computers and seeing what was in them and
03:13 then trying to put them back together and feeling bad when they didn't work right after.
03:18 And then the typical, you know, tweaking the software when you don't like what it does
03:22 until you make it work.
03:23 And then probably my senior year of high school is when I started teaching myself Python.
03:29 And it was because we had to do some for some government class actually.
03:32 - Oh, wow.
03:33 Okay.
03:34 - And we had to learn, we were learning about the stock market.
03:36 And every day you'd have to spend like 15 minutes going to like some stock website and
03:40 like filling out your fake stocks.
03:42 And so I wrote a really small Python script that would just pull the data from the website
03:47 and populate like an Excel spreadsheet.
03:49 And so every day the kids in the class were just like going through and like spending
03:52 15, 20 minutes by hand writing it down.
03:55 And I would just sit there.
03:56 - That's awesome.
03:57 - And so that was kind of the first time I was like, wow, this whole automation thing
04:00 is pretty sweet.
04:01 From there, Python just kind of, I caught the bug pretty early.
04:05 Python was definitely the way to go.
04:07 - Yeah.
04:08 Was Python your first programming language?
04:09 - My first programming language was the Windows registry and trying to undo all of the mistakes
04:14 of the operating system.
04:15 - It's been a while since I've been in the Windows registry, but good old regedit.
04:20 - I switched to Linux pretty quick.
04:22 In Linux and Unix.
04:23 - Are you still on the Linux?
04:24 - Mostly.
04:25 My desktops are all Linux.
04:26 My servers are obviously all Linux.
04:27 I like Mac for a laptop just because, you know, Linux has this thing where you tinker
04:32 with it.
04:33 And so then any small task you want to do, you end up like rewriting some deep script
04:37 in the operating system and like two hours later, you're like, what was that small thing
04:40 I was trying to do again?
04:41 - Yeah, exactly.
04:42 I got distracted.
04:43 I was rewriting something in there and off we go.
04:46 Yeah.
04:47 - Yeah.
04:48 So Macs are nice because you still have all the same Unix like properties that are great,
04:51 but you pay a price for reliability.
04:53 - You just sell a bit of your soul out, but boy, is that UI nice.
04:56 And those little menu bar apps are handy.
04:59 You know, I was, that's right.
05:02 I was been playing with running Ubuntu on my Mac M2 Pro, which it runs great, but it's
05:10 an ARM version of Mac or of Linux rather.
05:14 Well, both really, but boy, is there a limited supply of applications for an ARM Linux distribution?
05:21 Let me tell you, they're like, just download the Debian package.
05:25 - I imagine that'll change pretty quick though.
05:27 - Yeah, they're like, just download the Debian package.
05:28 You just install it.
05:29 Like wrong platform.
05:30 I'm like, again, over half.
05:33 But yeah, I think it will change as I think ARM's going to start to take over Windows
05:37 a little bit as well.
05:38 And obviously the Mac world is basically transitioning.
05:41 So anyone who has a Mac, yeah.
05:43 - I think it's Qualcomm that just kind of started hinting that they were going to try
05:48 and really heavily compete with the M line of processors and have some pretty good specs.
05:52 So it'll be good.
05:53 I have an M3 and it's pretty nice.
05:56 - It is really nice.
05:58 Like I said, I'd like to run more stuff on it, but it's still kind of Intel or x86 stuff
06:02 for Linux and Windows.
06:04 So it's a little hard to work with that, but still super fun.
06:08 That's a long way to say it's a very long time since I've been a reg at it.
06:12 Personally, it sounds like you as well being not naming Windows too much.
06:16 - I'm so bad at Windows now.
06:18 Like when I'm helping people with Python or something else and they show me their computer,
06:21 there's always that like 10 minute learning curve of like, okay, how do I do anything
06:27 basic on this machine?
06:29 Or even like the keyboard shortcuts you get so accustomed to when people don't have any
06:33 of those little things that you're just like, how do I select everything?
06:37 - That is it.
06:38 Like I did professional software development on Windows for a long, long time.
06:41 I even wrote a bunch of Windows apps.
06:43 It was great.
06:44 But going back and forth too quickly that or Linux, like just the hotkeys, I just get
06:49 broken.
06:50 So I'm just on Windows also when I come back to Mac, like I'm just completely out of sorts.
06:54 So yeah, it's fun.
06:55 All right, well, let's talk academics and Python from a probably a OS, mostly agnostic
07:04 perspective.
07:05 But yeah, just give us a sense of the kind of research you do.
07:09 You know, what is your field?
07:10 What do you study?
07:11 Those kinds of things.
07:12 So people get a sense of like, why am I talking about this?
07:14 Where are you coming from as you talk about doing all these things?
07:16 - The core of my work is pure neuroscience.
07:20 So basic science, what we do mainly in the lab is we take really tiny wires.
07:26 So they're like a fifth of the size of the human hair.
07:28 And now we're using something called silicon probes, which are, they're manufactured the
07:31 same way that computer chips are manufactured on silicon wafers using photolithography.
07:37 - Do you get a higher density that way?
07:38 Do you get like a bunch of little sensors on one point or something?
07:42 Okay.
07:43 - So we used to build these little drives.
07:44 I used to have one here, but I got rid of it.
07:46 Little drives by hand.
07:47 So you would just feed the wires in with forceps.
07:51 And so you'd get maybe 64 or 128 at most, depending on how much time you want to sit
07:56 there and feed the wires in.
07:58 But now you can just get the manufactured, you pay a lot more, but you get twice, three
08:02 times the sites.
08:04 And the whole point is the more sites you have, the more neurons you can actually record
08:08 from in the brain.
08:09 - Yeah.
08:10 You're not just saying this part of the brain lit up, but you can have a much better picture,
08:14 right?
08:15 - Yeah.
08:16 So a big part of your brain is the neuron.
08:17 And so of the millions and billions of neurons, depending on the species you're recording
08:22 from, we can record maybe a few hundred of them, but that's usually sufficient to actually,
08:27 in the specific region you study, and I can talk about that more, to discern some sort
08:32 of information from it.
08:33 So really the data type we really care about is this tiny little electrical voltages that
08:38 tell you what different neurons in the brain are talking about.
08:41 And so you put the wires in, you record the conversations of a bunch of neurons.
08:46 And particularly we're interested in two brain regions that are critical for memory, learning,
08:52 and decision-making.
08:53 And this is the hippocampus, which in humans is about the size of your pinky and a few
08:58 inches in from your ear, and the prefrontal cortex, which most people know about right
09:02 behind your forehead, important for learning, decision-making, and all those sorts of things.
09:07 So that's the core of my work is I'm in the lab doing the actual data collection and building
09:11 equipment to actually do that.
09:13 But once you have all of that data and the data keeps growing, like most other fields,
09:17 you got to do a lot of pre-processing, which takes Python.
09:21 You got to do a lot of post-processing, which takes a lot of Python.
09:24 And also we do something called neural decoding.
09:27 So not only do we just like say descriptively, what are these neurons doing, but we can go
09:32 one step further and say, what actual information are these cells representing?
09:38 So in the brain, we can kind of say, this is kind of the fundamental kind of information
09:44 transfer and how information is manipulated in the brain and how it ships information
09:49 from the environment into memory and how it uses that to make a decision.
09:53 All of those kinds of things we can use through fancy modeling and statistics and more recently,
09:59 deep learning and those sorts of things.
10:00 - We'll have to come back to deep learning later.
10:02 That'll be fun.
10:03 Well, given your background.
10:06 So for this hardware, do you write the software that actually talks directly to the hardware
10:11 or is there something that just records it and you grab some sort of custom file format
10:16 and run with it?
10:17 - Yeah, more recently, it kind of depends on the lab.
10:21 So as time goes on, there's more and more companies that you can just buy off the shelf
10:25 and recording platforms, mostly for the electrical engineering people.
10:30 It's kind of like an audio like amplifier, 'cause you're recording at millivolts in the
10:33 brain so you have to amplify it, write it to, if you're plugged in with a wire, write
10:37 it to the computer.
10:39 So all that takes software in various forms.
10:42 And then we do a lot of animal research.
10:44 So the tasks that the animals do are pretty much all automated.
10:49 But recently in the lab, we've kind of had this resurgence of like developing kind of
10:54 novel hardware and a lot of automation of behavior.
10:57 So I've kind of rewritten most of our entire behavioral stacks, which is a lot of just,
11:03 so microcontroller programming, which not a lot of that's in Python, a lot of that's
11:06 just kind of like C++ and those sorts of things.
11:09 But we have cameras all over, so I wrote this kind of like camera server that streams all
11:14 of the camera footage from a bunch of automated boxes to some like central server that just
11:19 collects all of that data.
11:21 So yeah, a lot of the behavioral stuff nowadays, we're just building in house to collect all
11:26 of the behavior data.
11:27 The EFIS stuff is now, especially because we're doing something called wireless recording.
11:32 So instead of just having a wire plugged into the head, it just writes it to like an SD
11:36 card or Bluetooth.
11:38 That's just kind of all on chip.
11:40 So it's just whatever the microcontroller language of the chip needs.
11:46 - This portion of Talk Python to Me is brought to you by Neo4j.
11:50 Do you know Neo4j?
11:51 Neo4j is a native graph database.
11:54 And if the slowest part of your data access patterns involves computing relationships,
11:59 why not use a database that stores those relationships directly in the database, unlike your typical
12:05 relational one?
12:06 A graph database lets you model the data the way it looks in the real world, instead of
12:10 forcing it into rows and columns.
12:13 It's time to stop asking a relational database to do more than they were made for and simplify
12:18 complex data models with graphs.
12:21 If you haven't used a graph database before, you might be wondering about common use cases.
12:25 What's it for?
12:27 Here are just a few, detecting fraud, enhancing AI, managing supply chains, gaining a 360
12:34 degree view of your data, and anywhere else you have highly connected data.
12:38 To use Neo4j from Python, it's a simple pip install Neo4j.
12:44 And to help you get started, their docs include a sample web app demonstrating how to use
12:48 it both from Flask and FastAPI.
12:51 Find it in their docs or search GitHub for Neo4j movies application quick start.
12:56 Developers are solving some of the world's biggest problems with graphs.
13:00 Now it's your turn.
13:01 Visit talkpython.fm/neo4j to get started.
13:06 That's talkpython.fm/neo, the number four and the letter J.
13:11 Thank you to Neo4j for supporting Talk Python to me.
13:15 - I think it's surprising how much software and even hardware, but definitely software
13:21 is involved for something that doesn't sound like a software discipline.
13:25 You wouldn't think of what you guys are doing as inherently almost like a software team,
13:30 but there's a lot of software there.
13:31 - Absolutely, and it's growing.
13:33 So it used to be 10, 20 years ago, more biology, I'd say, like more wet lab stuff.
13:38 But 90% of what I do as kind of a neurobiologist is really just engineering style things.
13:45 Like I'm more recently designing PCBs and I'm in the shop a lot, just like with saws
13:50 and hammers and drills and like actually physically building things.
13:54 And obviously a lot of code.
13:56 And the coding part is becoming bigger and bigger to the point where in the field, I
14:00 always say that the neuroscience is like about three decades behind astrophysics.
14:05 'Cause all the problems that like neuroscientists like say we're facing now as a field, they
14:09 had like three decades prior where in astrophysics, they're like, well, or in neuroscience, we're
14:14 like, what do we do with all this data?
14:16 This is, I mean, I'm collecting a hundred gigabytes an hour, if not more, like what
14:20 do we do with all this data?
14:21 - That is a lot.
14:22 - Yeah, but relative to like some of those big telescopes that are collecting like almost
14:27 petabyte scale.
14:28 - I would say both ends of physics, like the very, very extreme ends of physics.
14:32 So astrophysics, the very large and then particle physics, right?
14:36 At CERN as well, they've got insane amounts of data.
14:39 Yeah.
14:40 - And that's what we're starting to see, I think, in neuroscience too, is that kind of
14:42 division of like, because the scale of data collection is so big, you're starting to need
14:47 not just a single lab, but teams.
14:49 So we have a few institutes now that are just pumping out terabytes of data.
14:54 And so you start to see that division between the neuroscientists who are really in the
14:58 lab hands-on with actual neural tissue or the recording device, and the neuroscientists
15:04 who are just take the data and analyze it and develop new models and statistical models.
15:09 And also theory, there's always a dearth of theory in neuroscience, but the computational
15:14 modeling is certainly getting a lot bigger within the last few decades as well, where
15:19 people's entire job is just how do we model some of these things in code?
15:24 - You probably run into different research groups, different teams that have different
15:29 levels of sophistication from a software side.
15:32 And do you see like a productivity or a quality difference jump out from like the kind of
15:37 work or the velocity of work that people are doing there?
15:40 - Absolutely.
15:41 It makes me almost like a, it's a huge range.
15:44 There are like very sophisticated labs.
15:46 And usually those are the labs that have kind of just a pure software person on the team
15:51 or people who are very inclined towards software all the way to, and it makes me so sad when
15:55 people are spending like weeks in a spreadsheet, just manually doing things by hand.
16:00 - Yeah, I know.
16:01 - You could do this in five minutes in Python.
16:03 - And not only could you do it faster, you could do it without any errors.
16:07 - Yeah, more reliable.
16:08 - None of those like, oh, I misread it and I shifted off by a cell or I typed in.
16:14 - I missed something, right?
16:16 'Cause it just reads what's there, yeah.
16:18 - A lot of like graduate programs are starting to wake up to this fact that it's gonna be
16:23 almost impossible to do any science without some degree of proficiency in coding.
16:28 And I think a lot of, say grad students and postdocs and so on, when they actually sit
16:33 down and try and analyze their data, whether there's just an Excel or they need to write
16:38 a little Python script, that's kind of their first introduction is, oh, I have this data,
16:42 I need to do something with it.
16:44 I'm gonna Google exactly how do I read in this data or how do I do a T-test in Python
16:49 or how do I plot something in Matplotlib?
16:51 And that's kind of the level that they start getting into out of necessity.
16:55 But the sophistication and the speed, because they're usually just teaching themselves,
17:00 that's most of academia.
17:01 It's just, you have a problem, spend a few days Googling and reading books until you
17:05 find it.
17:06 - And once it works, you can kind of just leave it.
17:08 You don't have to clean it up or anything, right?
17:10 Yeah, okay.
17:11 - And it results in a lot of, I mean, the progress of science doesn't go away, but the
17:16 code is not robust.
17:19 And so that's why you see things, especially in other fields of psychology and such, like
17:22 replication crises and people have done meta-analysis of running the same software stack on 12 different
17:29 data sets and you get different results.
17:31 And so you start to kind of see the shaky foundation starting to bleed into the, like
17:37 you said, the reliability results.
17:40 - And you'd have consequences, not just it's more work or something.
17:43 - Exactly.
17:44 - Maybe we could start a bit by just talking about maybe the history, diving into this
17:48 a little bit more, just the history of programming in neuroscience.
17:52 I wasn't in neuroscience in any way, but I worked with a bunch of cognitive scientists
17:57 studying how people solve problems and thought about things at a lab for one of my first
18:03 jobs and we studied all through eye tracking.
18:05 Not the iPhone, but actual eyes.
18:07 It was fascinating.
18:08 It was tons of data.
18:09 And there were, like you described, a lot of people who would do sort of Excel stuff
18:12 and they would take the data and they process it.
18:15 Over time, we just started to automate these things and their first thought was, "You're
18:19 programming me out of a job." I'm like, "No, no, no.
18:22 This is the crappy part of your job.
18:24 Like you're supposed to analyze the results and think about it and plan new stuff and
18:28 now you can just focus on that." And as the software got better, we just tackled bigger problems.
18:34 So maybe give us a bit of a history of on your side.
18:37 So I love the cognitive science.
18:39 That's my more background is cognitive science.
18:42 I was an undergrad and grew up in science in a cognitive science department while also
18:47 doing some wet lab neuroscience stuff.
18:49 So it's fun.
18:50 Yeah, absolutely.
18:51 Did you start out with like MATLAB and that kind of stuff?
18:54 Is that where they told you you need to be?
18:56 Neuroscience has certainly had, at least our branch of neuroscience, just because by the
19:01 nature of recording voltages and you need to write to a computer.
19:05 So there has been kind of a long history of, for as long as there's been even like punchcard
19:11 computers, people have kind of read in the data into the computer and done their statistics
19:17 on that rather than something else.
19:20 I'm actually recently writing a kind of a review article on the history of data science
19:24 and neuroscience.
19:26 And I loved this paper.
19:27 It was from 1938.
19:31 And they took an EEG spectrum, and so EEG is just the continuous time series of brain
19:37 voltage.
19:38 So you're not in the brain recording.
19:39 And this is, I think, from humans.
19:41 And they took something called the Fourier transform, which I'll be as up to speed with
19:45 that is you basically just take some oscillating signal and you break it down into its constituent
19:50 parts.
19:51 And most of you have seen it before if you've ever seen like an audio spectrogram, that's
19:54 kind of the most notable visualization where you can kind of see the high frequencies and
19:59 the low frequencies.
20:00 So basically it pulls the frequencies out of the signal, right?
20:04 Yeah.
20:05 But the way they did this, and this is 1938, there's no computers.
20:08 So they actually had mechanical device where they would just take this EEG trace that was
20:13 on tape and they would feed it into this like mechanical machine.
20:17 And it would basically read kind of this black line on the tape.
20:21 And so as it would crank the tape around this machine, depending on kind of the frequency
20:27 that the line went up and down, that would read out the Fourier transform.
20:30 So it was mechanical.
20:31 Wow.
20:32 Like a lot of those cool devices back in the older days.
20:35 That's impressive.
20:36 Now you can get that same thing with in MATLAB.
20:39 You just type FFT, parenthesis, parenthesis, put your data in the middle and you get the
20:43 same thing in microseconds.
20:45 But neuroscience, at least my field has kind of always had this kind of serendipitous relationship
20:51 with computing generally, coding generally.
20:53 And a lot of the code I think earlier on was kind of Fortran-ish and then it moved towards
20:59 MATLAB.
21:00 And MATLAB's kind of had its stake in the ground for a long time, just because that
21:03 was the first kind of software that you could really do array manipulations on well.
21:08 And it was kind of a higher level than some of the lower level programming.
21:12 So a lot of the older labs have their entire code base and stack and analysis software
21:18 in MATLAB.
21:19 And so it's only been within maybe five to six, seven years, maybe a bit longer, 10 years
21:25 that you've really seen Python start to supplant MATLAB as kind of the de facto programming
21:30 language in labs, just because of the cost of trying to transfer everything over.
21:35 And despite the fact that MATLAB isn't open source and it's extremely expensive, most
21:40 universities have licenses.
21:41 And so that kind of facilitates.
21:43 It's prepaid in a sense.
21:44 Yeah.
21:45 But it is still pretty expensive.
21:46 Especially if you get those little toolboxes like Wavelet Decomposition Toolbox, 2000
21:52 bucks instead of a pip install, you know.
21:54 And again, we do a lot of signal processing.
21:56 And so that's exactly the place you want to be.
21:59 And like MATLAB usually controls because it has pretty good control over like external
22:03 hardware.
22:04 You can run like your behavior, your task kind of in MATLAB mode.
22:07 So you can kind of do everything in one language as you would like to do in Python.
22:12 But it's starting to kind of go away.
22:14 And I think a lot of that is just because the allure of Python, which is so many tools
22:19 and because it's probably a lot easier to learn for most people than MATLAB.
22:24 We're kind of starting to see that switch now that there's kind of more to offer, I'd
22:28 say a lot of scientists than MATLAB.
22:30 Yeah.
22:31 You said 10, 12 years ago.
22:32 At least in our field.
22:33 Yeah.
22:34 The difference in the external packages on PyPI you can get, and especially the ones
22:39 for data science have just exploded.
22:43 The choices and the stuff that's out there is, it's pretty diverse.
22:46 It's pretty crazy.
22:47 The only other one that I think is still in pretty strong competition with Python from
22:51 the perspective of, we collaborate a lot with like mathematicians and statisticians and
22:55 R is their usual favorite, just because statistics, like all the best statistical packages are
23:02 still pretty much in R. And so that's where a lot of people live.
23:05 ggplot is pretty good.
23:06 Pretty makes pretty good plots.
23:07 Yeah, that's interesting.
23:08 It's really focused and it's really good at what it does.
23:13 And one of the things that I think is worth just considering, if somebody comes, let's
23:18 say a brand new first year grad student comes into the lab and you're like, all right, what's
23:23 your programming experience?
23:24 Like, well, programmed the clock on the VCR.
23:27 Like, okay, we're going to have to start you somewhere or something.
23:30 And you could teach them something really specific like MATLAB or something along those
23:35 lines.
23:36 But if they learn something like Julia, not like Julia, like Python, not Julia, maybe
23:41 even not really R, but R is closer somewhat, is they learn not just a skill for the lab,
23:48 but it's kind of almost any software job is potentially within reach with a little bit
23:53 of learning about that area.
23:55 Right.
23:56 Like if you know Python, you say, I want a job.
23:58 There's a massive set of options out there.
24:00 If you say, I know MATLAB or even Julia, it's like, okay, well, here's the few labs and
24:07 the few research areas and the real, I just think it's something that.
24:11 A lot of engineering firms.
24:12 Yeah.
24:14 A lot of academic folks should consider what happens if the student doesn't necessarily
24:19 become a professor.
24:20 You know what I mean?
24:21 Which actually is a lot of the time, right?
24:24 Or a professional researcher of some sort.
24:26 And that's a really awesome skill to have on top of your degree.
24:29 So I think that's just a big win for it.
24:31 I'm happy to see that.
24:32 And that literal situation just happened where a statistician that we were collaborating
24:36 pretty closely with graduated, brilliant guy, and got a job at Microsoft.
24:42 And so we were in a meeting after he was there and they were like, what's some advice that
24:47 you have now that you've been in industry for a while?
24:50 And he's like, stop using R, learn Python because everyone here uses Python.
24:55 And it took me a few months to kind of switch from the R worldview of, you know, caret hyphen
25:01 to equal sign to actually to work and collaborate with everyone because everyone's just using
25:07 Python.
25:08 Yeah.
25:09 From his perspective.
25:10 And I'm sure it's not unique.
25:11 No, I'm sure that it's not either.
25:12 I think, yeah, I just think it's not like a religious war.
25:15 It's not like, oh, I think Python is absolutely better.
25:17 You should just not use other stuff.
25:18 I just think it's preparing people for stuff beyond school.
25:22 It's a pretty interesting angle to take.
25:24 And it's not like you can't learn other things.
25:25 I think it's really good to learn other things, especially ones that are complimentary, where
25:30 R can be complimentary, especially now that they have a lot of the subsystem packages.
25:34 When I get R code, I usually just write like a sub process.
25:39 I did this recently.
25:40 I just wrote like a sub process line to call the R script because I was too lazy to rewrite
25:45 it.
25:46 Yeah, sure.
25:47 But there's other like, I don't know, like Rust is probably a good one to probably try
25:49 and brainchallenge.
25:50 Or like lower level languages like C++.
25:53 If you need for what you're doing.
25:54 Yeah.
25:55 It sounds like you guys do for talking to hardware and stuff like that.
25:58 Yeah.
25:59 Occasionally.
26:00 Boy, there's not a lot of things Python can't do.
26:01 Break out some MicroPython when you got to get your microcontrollers and stuff.
26:06 This portion of Talk Python to Me is brought to you by Posit, the makers of Shiny.
26:10 Formerly RStudio.
26:11 And especially Shiny for Python.
26:15 Let me ask you a question.
26:16 Are you building awesome things?
26:18 Of course you are.
26:19 You're a developer or data scientist.
26:20 That's what we do.
26:21 And you should check out Posit Connect.
26:24 Posit Connect is a way for you to publish, share and deploy all the data products that
26:28 you're building using Python.
26:31 People ask me the same question all the time.
26:33 Michael, I have some cool data science project or notebook that I built.
26:37 How do I share it with my users, stakeholders, teammates?
26:39 Do I need to learn FastAPI or Flask or maybe Vue or ReactJS?
26:45 Hold on now.
26:46 Those are cool technologies and I'm sure you'd benefit from them, but maybe stay focused
26:49 on the data project.
26:51 Let Posit Connect handle that side of things.
26:53 With Posit Connect, you can rapidly and securely deploy the things you build in Python.
26:58 Streamlit, Dash, Shiny, Bokeh, FastAPI, Flask, Quarto, Ports, Dashboards and APIs.
27:04 Posit Connect supports all of them.
27:07 And Posit Connect comes with all the bells and whistles to satisfy IT and other enterprise
27:11 requirements.
27:12 Make deployment the easiest step in your workflow with Posit Connect.
27:17 For a limited time, you can try Posit Connect for free for three months by going to talkpython.fm/posit.
27:23 That's talkpython.fm/POSIT.
27:26 The link is in your podcast player show notes.
27:29 Thank you to the team at Posit for supporting Talk Python.
27:33 So another thing is, I don't know how it's received.
27:36 I know it took a while to kind of really catch on.
27:39 And I think the thing that just broke the final barriers for open source being adopted
27:45 in at least in business was the AI stuff and the data science stuff.
27:49 People are like, "Oh, we can't use this open source stuff.
27:51 We got to have a SLA and some company we can sue if our code doesn't work right."
27:55 Or, you know, whatever, right?
27:56 Something crazy like that.
27:58 And they're like, "But you understand all the AI and all the data science.
28:02 We have to use this open source stuff." Like, "All right, fine." What's the open source story for you guys?
28:07 Academia has probably championed open source for a really long time just because, I mean,
28:13 open source back, I mean, even when I first started, was just if you read a paper and
28:17 someone has some new fancy analysis, before it became a bigger push by like funding agencies
28:23 to like actually post it to GitHub or some repository.
28:26 I mean, you could just email people and be like, "Hey, I saw your paper.
28:29 I want that script." And they would just send you a MATLAB file.
28:32 And it would be just whatever they had written, but it was in MATLAB and you'd have to kind
28:37 of tear it apart yourself.
28:38 And there was little to no documentation.
28:40 You'd be lucky if there's comments and it's spaghetti code.
28:43 But you figured that out and you kind of work backwards and deconstruct it.
28:47 And eventually you kind of have their code.
28:49 So that kind of ethos of just scientists are really good by and large of just sharing information
28:56 and helping people out.
28:57 And if you have a question, just ask.
28:59 It's kind of always been there.
29:01 At least in our field, it's not as competitive as some other ones where you're just kind
29:05 of like racing to get the next project out.
29:08 It happens, but rarely.
29:10 But now a lot of funding agencies and just in general, people are just excited about
29:15 when you publish a paper, you put a GitHub link in the bottom of the paper and then that
29:19 links to the repository.
29:20 And yeah, maybe it's not been updated in a while, but the code's there and you can just
29:24 take it, grab it.
29:25 For the reproducibility.
29:27 How about using other things?
29:28 Is there SciPy?
29:29 I know for astronomy there's Astropy.
29:32 Is there a NeuroPy?
29:33 It's really still more analysis dependent and pre-process dependent.
29:38 So there's kind of this, it's still the early days where there's probably too many formats
29:43 just because no one can agree on what's the best one.
29:46 So like even a lot of the data formats are written kind of in Python to take whatever
29:50 data you have and reformat it to something shareable.
29:53 There's five or six of them floating around.
29:55 There's probably two that are still duking it out to see which one will be the best.
29:59 And probably five years from now, there's going to be a better one.
30:02 So data formats, certainly there's kind of this, there's a few that are neck and neck.
30:07 Analysis pipelines, a lot of those are still done in house, but there's starting to be
30:10 a lot more toolkits and frameworks and packages.
30:13 There's some really good ones that have more documentation written.
30:17 They're on the PyPI repositories.
30:18 So you can just pip install them and you have them.
30:22 The computational neuroscience people are great at this.
30:24 So all the neural simulation software, that is all really well documented, really well
30:29 written.
30:30 A lot of good example code and tutorials and so on.
30:34 So yeah, we're starting to see kind of this more robust kind of ecosystem where you can
30:40 just kind of pull things.
30:41 It still just kind of varies.
30:43 There's probably still not one go-to place other than the standard data science toolkits.
30:48 Right, right.
30:49 The pandas and so on.
30:51 Yeah.
30:52 So NumPy, Matplotlib, pandas, scikit-learn, if you're doing deep learning, PyTorch or
30:55 TensorFlow, all of those still apply to any data science stack.
31:00 Yeah, of course.
31:01 What's your day-to-day stack look like if you're sitting down to do some analysis?
31:05 I have a VS Code and a like autocomplete where I just write import in and it just NumPy,
31:11 Matplotlib, pandas.
31:12 Then I usually delete pandas because unless I have a CSV file, I'm not using it.
31:17 So NumPy, Matplotlib, I can probably do 75% of the things I want to do.
31:22 Scikit-learn and SciPy, obviously, if I'm doing any stats with those things, those libraries
31:27 I might go to.
31:28 And then over the last few years, I kind of just have my own, just because you catch yourself
31:33 writing the same functions over and over and over again.
31:36 So I just started building my kind of internal framework of just things I know I need.
31:40 So if I'm working with LFP data, I have all my filters there.
31:44 If I have spike data, I have all my spikes there.
31:47 We do a lot of decoding, so developing deep learning algorithms to decode neural data.
31:52 All of those are kind of listed there.
31:54 And then I started realizing, dude, internal tools make the difference between solving
32:00 a problem in 10 minutes or solving it in an hour where I can just sit down and have everything
32:04 automated to come up.
32:06 So yeah, the standard data science stack I use pretty frequently.
32:10 Hardware stack, I mean, so VS Code, I just recently switched to just because everyone
32:15 was talking about it from like Sublime or I usually just edit it in a terminal.
32:20 And I was like, yeah, I'll try it out.
32:21 Everyone's talking about it.
32:22 And it's one of the good things Microsoft has done.
32:24 It's pretty sweet.
32:25 Yeah, that's pretty sweet.
32:26 That's pretty nice.
32:27 And when you do the VS Code stuff, are you just writing straight Python scripts or are
32:31 you doing like VS Code on top of notebooks?
32:34 Yeah.
32:35 You know, it has like that kind of like text view of a notebook type of thing, I think.
32:38 I used to use exclusively Python scripts, so just the .py, started seeing how great
32:43 Jupyter was and so then you start doing everything in Jupyter and then you start to have all
32:47 these like convoluted notebooks and like notebook V1 through 7.
32:51 So then you realize that you've got to like find a balance between like, you know, notebooks
32:55 are great for presentation and for like quickly testing.
32:58 But the sooner you can get it into like a class structure or a package or something.
33:02 The sort of productized version of it, you wanted to get it down into Python code a lot
33:07 of times probably.
33:08 Exactly.
33:09 Like your internal tools you talked about, you're like, all right, this is the library
33:11 you just call stuff, right?
33:13 That just belongs more as a package and not a...
33:15 The sooner you can kind of condense, pull the code out of the notebook and just leave
33:19 notebooks for presentation.
33:21 It's probably the best.
33:22 Yeah.
33:23 It's a lot of pipelines.
33:24 So it's a lot of pre-processing pipelines.
33:26 So you don't want, you know, 50 cells of just moving data.
33:29 Preparing, yeah.
33:30 You talked about having quite a bit of data, some of that being image based.
33:35 Sounds like a lot of work.
33:36 Do you have like a big compute cluster?
33:38 Do you just have like an ultra, an M3 ultra or whatever?
33:42 That's not even out yet, it's M2s.
33:44 But do you just have a big machine or do you guys do cloud stuff?
33:47 What's compute look like?
33:48 It depends on what I'm doing, what I need.
33:50 Most things, let's be honest, I could probably just use my desktop computer is really a super
33:55 micro server, just runs Linux.
33:58 But then for doing deep learning, you need GPUs.
34:01 And so we have a nice set of GPUs we can pull to.
34:04 Do you do your own training on LLMs and other deep learning things?
34:08 The only LLM training I did was just for fun, but yeah, it's all the deep learning stuff
34:13 we have to train and else on, on our particular.
34:15 So that's all GPUs for that.
34:18 A lot of the statistics we do are like permutations.
34:21 So you need to kind of like parallelize them out into CPUs.
34:24 And then I'll pull like a CPU cluster we have if I need it.
34:27 Do you see Irvine have like a big compute resource sort of thing you can grab?
34:31 They have a campus wide one that you can get on and there's a few independent ones that
34:36 I have access to.
34:37 So GPU is kind of in a different place than CPUs are.
34:41 So I can just kind of pick and choose.
34:42 And I have a few servers in the lab that I've just kind of put together that all my camera
34:47 stuff, all my behavior, I kind of wrote it.
34:49 So it's cloud based.
34:51 I can kind of just pull up my phone and look at the videos of what animals are doing and
34:55 stuff.
34:56 All that runs just a server in the lab.
34:58 So yeah, the compute is there when you need it.
35:01 And I think as I've matured, I've kind of learned when to use what compute when and
35:05 when it's worth taking the extra time to use it when you don't need to.
35:10 And also when a lot of the times you don't even need it.
35:12 So I know a lot of people when I see their code and they complain that it takes like
35:16 an hour to run.
35:17 I mean, just using multiprocessing in Python, that in and of itself is enough to not need
35:24 to use a cluster.
35:25 They're just using a single thread for their analysis.
35:28 For sure.
35:29 Or just a bad programming patterns, design patterns, like you're looping over the thing
35:33 in the pandas data frame instead of doing vector operations in the pandas data frame,
35:38 like that kind of stuff.
35:39 Right?
35:40 A hundred percent.
35:41 Yeah.
35:42 I mean, that was one of the first things that I usually teach.
35:44 I have this little example script that shows that like, why is it better to preallocate
35:48 array rather than to just append to the bottom?
35:50 And it's like these things that we kind of take for granted now, but it's not intuitive
35:54 unless you actually see it.
35:55 No, you learn it the hard way, but it sticks in your mind once you learn it.
36:00 I think that's the issue.
36:01 It's like people just take it for granted.
36:03 Yeah, for sure.
36:04 Like, why didn't you know that?
36:05 How do you, how do you onboard new people?
36:07 If you, you know, get new grad students or people contribute, other contributors.
36:11 It just kind of depends on the lab.
36:14 Every lab kind of has their own structure of just kind of this hierarchy of expertise
36:19 where like I started as an undergrad and I just volunteered in a lab at a different university
36:25 and just volunteered my time, eventually could get paid.
36:28 Just wanted to spend time in the lab and you could all the way up to grad students who
36:32 were there to get a PhD and have more kind of autonomy over their projects.
36:35 A postdoc who has a PhD and five to eight years of experience and so can pretty work
36:42 well.
36:43 Then there's like staff scientists or even a lot of labs now are hiring just pure engineers
36:47 or pure software people because there's such a need for that.
36:50 And so yeah, it really just depends on the lab specific situation and what their focus
36:55 is on and what they need.
36:56 Cool.
36:57 I guess if you have a good NSF grant and you got some extra money, it might be money well
37:01 spent to hire some student who has good programming skills, right?
37:04 Absolutely.
37:05 Yeah.
37:06 You talked about the video stuff that you're doing in the streaming video.
37:09 Do you actually do analysis on that or is it just for you to go back and look at?
37:12 Oh yeah.
37:13 Yeah.
37:14 We do analysis on that.
37:15 There's actually a pretty cool deep learning package out now.
37:17 We didn't write it.
37:18 It's something that another lab did where you just give it the video frame and it can
37:21 automatically segment kind of it's an animal.
37:24 So like where their paws are or where their like nose is looking or some cases people
37:29 have like you were talking about eye tracking.
37:31 They do eye tracking in like mice now.
37:32 They do eye tracking on mice?
37:34 That was hard on humans in the nineties.
37:36 Yeah.
37:37 There's a lot of VR.
37:38 So they put kind of mice in this like VR system and they can like see where their little mouse
37:43 pupil is looking on like the VR screen.
37:45 Yeah.
37:46 So you can do some different scenarios and they can detect that and they react to it?
37:49 Oh, absolutely.
37:50 Incredible.
37:51 So yeah, I know a lot of stuff, at least in our field, the Nobel prize was awarded for
37:55 you stick some electrodes in the hippocampus part of your brain.
37:58 That's important for learning and memory.
38:00 And then you have the animal kind of run around some environment and then you take some video
38:05 data of kind of where they were running the environment.
38:07 And if you were only looking at the brain data, you can predict to like 90 some percent
38:13 accuracy the location of the animal.
38:15 So you can show this kind of this correspondence that inside the brain is a map of kind of
38:19 the environment.
38:20 Our stuff, we're taking that a little one step further that says this map is not just
38:25 for space, it's for non-spatial and other things too.
38:28 There's this kind of network of information in the brain that the animal can kind of like
38:32 navigate through, even if they're just standing still, but thinking about some sort of problem.
38:37 But we use video data to validate what the animal's doing or check what kind of tasks
38:41 they're doing.
38:42 So yeah, a lot of multimodal heterogeneous data that each needs its own funky pre-processing
38:48 and depending on the task at hand, you're writing something new to ask that question.
38:53 So is that OpenCV?
38:54 Yeah, my stuff is OpenCV.
38:57 Streamed over sockets and some Django webpage.
39:00 It's fun.
39:01 It's cool to build.
39:03 That sounds really cool.
39:04 Yeah, absolutely.
39:05 So what kind of questions are you answering with the video?
39:06 Yeah.
39:07 Or is it just to correlate back with the time series of what you're measuring in the brain?
39:11 So like I said, the Nobel Prize was for the spatial map.
39:15 We're doing this non-spatial stuff.
39:17 And so we kind of do both in the lab where we have an animal kind of run around and then
39:21 we have an animal just kind of sit still and do some sort of mental task.
39:25 In our case, they have to memorize a sequence of odors.
39:28 And if the sequence gets shuffled, they make a different choice.
39:31 And so we're basically showing how does the brain work for the spatial part versus the
39:36 non-spatial part?
39:37 What's similar about these two things?
39:39 What's different about these things?
39:40 And we show that one of our recent papers was that the brain uses a lot of the similar
39:45 mechanisms to navigate space as it does to navigate this kind of non-spatial odor task.
39:50 But we also showed that there's this mechanism that the brain uses to take kind of discrete
39:55 memories and link them together into some kind of miracle.
39:58 Best case being like this talk, we've been talking back for 30 some minutes now.
40:03 Inside there's kind of chunks of the conversation.
40:05 So if tomorrow someone was to ask you, what did you and that neuroscience guy talk about
40:10 on your podcast?
40:11 You would kind of rattle off this story of, oh, we talked about history of Python and
40:16 this and this and this, right?
40:18 This, this and this are each kind of discrete memories that in your brain you kind of lock
40:22 together and store them so that you could use them, make decisions about them and so
40:28 on.
40:29 Think about it as a whole, not just the little ideas, every little idea, but like just the
40:32 big concept of it, right?
40:34 Exactly.
40:35 Yeah.
40:36 And it's a fundamental thing that the brain does.
40:37 Like people say humans are storytellers and your life is kind of the sequence of events
40:42 of stories.
40:43 And so you use that every single day and across a bunch of diseases, that's one of the first
40:47 things to actually be impaired, whether it's addiction or schizophrenia or Alzheimer's,
40:53 that kind of ability to link things in time and link them well and make decisions about
40:57 them starts to get impaired.
40:59 That's not great when that happens, but that is what happens, right?
41:01 Absolutely.
41:02 And I guess with the software engineering practices, for lack of a better word, that
41:07 you would recommend that maybe other grad students, professors who are feeling like
41:12 they're not, they don't have their software game fully together, pay attention to it.
41:15 And maybe what should they ignore, right?
41:17 Like should they pay attention to like source control and get up?
41:20 Should they have unit tests or, you know, what should they pay attention to or not?
41:23 First off, no one writes tests.
41:24 That is just only the very few, very well put together.
41:28 And usually people who just came from industry write tests.
41:31 Sure.
41:32 That's not an issue.
41:33 But yeah, first off, just learn Python.
41:35 I've said that hundreds of times and I'm preaching to the choir in this audience.
41:39 You are, for sure.
41:41 Learn Python.
41:42 It's honestly, there's not much better you could learn.
41:45 And two, it's, you know, it's quintessential automation stuff.
41:49 So it's really just think about the things that you're doing, the spreadsheets or, you
41:53 know, the simple things and really just ask yourself, if you find yourself doing any repetitive
41:57 tasks, that's a software problem.
41:59 Those are the things to kind of look at first.
42:01 So you have your text editor in one window, Google in the other and stack overflow your
42:06 way to learning.
42:07 And so the way people, I think, really do kind of teach themselves Python is probably
42:10 the best way to learn.
42:12 But nevertheless, I think there is a real need for, and again, we're starting to see
42:17 more of it, just formal education, even if it's just a course.
42:20 Our program is really great that they started to teach a Python course just because the
42:25 students requested it because they knew how important it was.
42:28 We have Python for neuroscience.
42:30 Exactly.
42:31 Okay.
42:32 Yeah.
42:33 I mean, you just work your way through if statements for loops, data types.
42:37 You probably also, I'm just guessing you get a chance to work with some of these external
42:40 libraries that are relevant to studies they're doing, right?
42:44 Rather than here's an example of stock market data.
42:46 You're like, great, not a trader.
42:49 I don't want to be a programmer.
42:50 Why am I here?
42:51 You know?
42:52 Yeah, it's really relevant.
42:53 And I think it's just kind of seeing like, oh yeah, I would do that this way, but this
42:56 is so much easier if I use Python.
42:58 And I can use it in Python and just seeing that, oh, it's not as bad as the mountain
43:03 looks a lot higher when you're at the base than the summit.
43:05 So it does.
43:06 Seeing it done once is usually enough to kind of tell people it's not as bad as you originally
43:11 think.
43:12 Yeah.
43:13 I feel like maybe I saw it this way.
43:14 I certainly know a lot of other people see it this way.
43:16 I mean, when I was younger, but a lot of people see it this way as well, is that like, you
43:19 got to be crazy smart to do programming.
43:22 It's really challenging.
43:24 It's kind of one of those things that only few people can do.
43:27 And then you get into it and you're like, oh, it's, it's not a few really huge steps
43:32 and things you've got to solve.
43:33 It's like a thousand small steps and each one of the little small steps, you're like,
43:37 that was actually easy.
43:38 That's no big deal.
43:39 What's the next step?
43:40 And you get to the end, you're like, where was the big step?
43:42 Where was it really hard?
43:43 Right?
43:44 Yeah, absolutely.
43:45 Yeah.
43:46 Do you have some experience where people in the department or people that worked with
43:49 you are like, ah, I'm not a programmer.
43:51 I don't want to do this stuff.
43:52 Then they kind of got into it and really found out that programming was something they really
43:56 liked.
43:57 Are there any converts out there?
43:58 Yeah, I would say so.
43:59 I mean, I think there's kind of two kinds of people.
44:01 There's people who program just because what is it?
44:04 The programming is an art book or whatever.
44:06 Like, like they, they love it just for the sake of like loving.
44:09 And I'm probably closer to those kinds of people, right?
44:11 Like, I just think it's the coolest thing, like academic, but then there's the people
44:15 who just kind of see it as like, it's a tool like anything else.
44:18 And so like you could be an expert in a drill or you could just know to pick up a drill.
44:23 That's kind of the majority of people is that it's just another tool in their toolkit to,
44:28 especially for a scientist, just to answer the question that you're trying to answer.
44:32 And I would even flip the reverse where there's been some times where I've maybe even used
44:37 Python too much in the sense that like I made a problem more because it's like the automation
44:43 dilemma, right?
44:44 It's like, do I spend an hour automating this when I could do it in 10 minutes?
44:47 You got to do it a lot of times and all of a sudden the hour is worth it.
44:49 But if it turns out you don't.
44:50 It's like, I might need this a year from now, so I might as well just write the script.
44:54 Whereas I could just do it in Excel.
44:56 That's pretty standard, standard problems we all run into in programming.
44:59 It's like, I'll write some code to do that or I could just be done with it for sure.
45:04 All right.
45:05 I think we've got time for a couple more topics.
45:07 One thing that I think might be fun to talk about is publishing papers, right?
45:13 Obviously if you're in academics, especially in labs, you got to publish papers.
45:16 Do you use notebooks and stuff like that for your papers or is that just kind of separate?
45:22 Yeah, like I said, notebooks are great for presentation.
45:25 So yeah, I use notebooks kind of for development just because you can quickly run code and
45:31 go back up and move things around.
45:32 And so I kind of like that ability to kind of just a stream of consciousness, write code
45:37 until you kind of like see how it's kind of working, the prototype, and then refactor
45:40 out into an actual .py document or a package or something.
45:45 So that's kind of been my workflow and it works pretty well.
45:47 But then when you actually have the code and it works and it's robust, you actually want
45:51 to put, there's a lot of figures, that's usually the main thing.
45:53 So you kind of put all of that.
45:55 Here's the data, here's the pre-processing, here's the figure one, here's figure two,
45:59 figure three in the notebook just so it's reproducible and other people can download
46:04 it and rerun your code and that sort of thing.
46:06 So I think that's slowly becoming kind of the standard approach for those labs that
46:11 use Python and they share their code openly.
46:14 That's kind of how they do it.
46:16 Anything like executable books or any of those things that kind of produce printable output
46:21 out of the notebooks, like publishable output out of the notebooks?
46:24 Not a lot, but there's one of the journals, it's called eLife.
46:29 It's kind of, it's trying to like push the boundaries of what it means to publish a scientific
46:33 paper.
46:34 And so they kind of have, because most papers are really just on the web nowadays, the journals
46:39 aren't really physical journals as much anymore.
46:42 They kind of have like papers as executable code where you can like plot the figure in
46:47 the browser and kind of run through the notebook and just as an experiment.
46:51 But it's pretty cool to kind of see these like new alternative ways to still convey
46:56 the same findings, but you can play with it, you can kind of see how some of the methods
47:01 are kind of implicit in the...
47:02 What's the name of the journal?
47:03 That's eLife.
47:04 eLife.
47:05 Okay, cool.
47:06 Yeah, you probably, I suppose, need some way to capture the data output because you might
47:11 not have access to the compute to recompute it.
47:15 Somehow it's got to sort of be a static version, but that sounds really cool.
47:18 Yeah.
47:19 And especially for like most recently, some of our like trained models, it's becoming
47:23 more important to just share the weights and share those sorts of things too.
47:28 You can't just share the code to train the thing if people don't have the compute to
47:31 actually train them themselves.
47:34 It's kind of growing to not just sharing your data, not just sharing your code, but you
47:37 need to share like the key derivatives of the pre-processing and those sorts of things.
47:42 Or even just sharing the version numbers, because there's been psychology or fMRI literature,
47:47 there's like a bug in some version that made a lot of the results null.
47:51 And so, one person could use version 3.7 of a package, but that had a bug, but people
47:56 don't know that.
47:57 So they claim it's not reproducible, but it's really just not the same algorithms.
48:00 Yeah, yeah, yeah.
48:02 Or like across languages, like if you rerun the same analysis in MATLAB versus Python
48:06 versus R, especially complex ones, there's a lot of little design decisions under the
48:11 hood that might tweak exactly how that regression fits or exactly how that, if you're statistically
48:17 sampling how the sampling works under the hood or those sorts of things.
48:20 Awesome.
48:21 Are you familiar with the Journal of Open Source Software?
48:23 Yeah.
48:24 I had the folks on from there.
48:25 Yeah, I had them on quite a while ago and I think they're trying to solve the interesting
48:29 problem of if you take the time to create a really nice package for your area, you might
48:35 have not taken your time writing the paper.
48:37 And so you wouldn't get credit because you don't have as many papers to publish.
48:40 So they let you publish like your open source work there, which I think is pretty cool.
48:44 What do you think about that?
48:45 We kind of have that same problem.
48:47 One of the, I run a nonprofit called Continual AI.
48:51 It does artificial intelligence and outreach and research, and we have conferences and
48:56 all sorts of events.
48:57 But one of the main things we've done is we built a deep learning library on top of PyTorch
49:02 called Avalanche.
49:03 And so we had a really great community of mostly volunteers who just saw the need in
49:08 the field and put it together.
49:10 But then again, it's like a lot of us are academics.
49:12 How do you present this?
49:13 And so you write wrapper papers around kind of the framework.
49:18 So that's kind of been the de facto way of like, it's not really a paper, but you still
49:23 need to share it and get credit for it and put your name on it.
49:28 It's certainly an issue.
49:29 I'm starting to see it not even just with software, but even with hardware, because
49:32 hardware is becoming more open source in our field.
49:34 And so you just kind of write like a paper about the hardware solution to some problem.
49:39 That's cool.
49:40 It's better than a patent.
49:41 Yeah, it's definitely better than a patent.
49:42 Patents, while they serve a purpose, are pretty evil.
49:45 Let's wrap things up with maybe just, you mentioned Continual AI.
49:50 Tell people a bit about that.
49:51 It's the largest nonprofit for continual learning.
49:54 Continual learning in a nutshell is, say I have a neural network and I train my neural
49:59 network to classify cats.
50:01 I classify cats to 90% accuracy and we're like, yeah, this is why neural networks are
50:05 great.
50:06 I take that same trained neural network on cats and I trained it on say dogs.
50:10 It does really well, 90% accuracy on dogs.
50:12 We're really excited why neural networks are so great.
50:14 But the issue is if I take that, again, the same network, I just trained it on dogs and
50:18 previously trained it on cats.
50:20 I try and test it on cats again, it's going to forget pretty much everything it learned
50:25 about cats.
50:26 And this is an old, old problem back in like, you know, the good old days when neural networks
50:30 were connectionist models and it was computer scientists, it was the cognitive scientists.
50:35 They noticed-
50:36 Overtraining or something like that, right?
50:37 Kind of overfitting.
50:38 It's neural related.
50:39 Yeah, it's very similar.
50:40 Catastrophic forgetting, they call it the sequential learning problem, which is why
50:44 I'm really interested in it because I'm really interested in continual learning and sequential
50:48 memory or in neuroscience, it's called the stability plasticity problem.
50:51 So when do you learn?
50:52 When do you remember?
50:53 And so over the last, since we started the organization in 2018, the field has kind of
51:00 exploded just because there's such a need for overcoming this across a lot of use cases.
51:06 So like a lot of times you could only see the data once.
51:09 So the way you solve the problem generally is you just shuffle in cats and dogs into
51:13 the same data set and retrain your model.
51:15 But now the neural networks are getting bigger and bigger and bigger.
51:19 Retraining is getting costlier and costlier.
51:20 You can't just have, can't train a network on petabytes every time you want to update
51:24 it.
51:25 That's even if you have access to the data and the storage to save it and so on and so
51:29 forth.
51:30 So clever ways to solve the problem is, so we're kind of around that.
51:33 We have neuroscientists, we have computer scientists, AI researchers across academia,
51:38 industry, all that are just a bunch of people really interested in this problem to just
51:42 come together and share papers, share ideas.
51:45 We just had a conference, we sponsor a lot of competitions for people to kind of put
51:50 forward an idea to some problem that we kind of put out every year.
51:54 So it's been really, really exciting to kind of see the community grow over the years and
51:58 all the tools and fun things that's kind of come out.
52:00 Well, it's definitely a hot topic right now.
52:02 Absolutely.
52:03 The cognitive scientists and neuroscientists studied neural networks and stuff and it was
52:07 kind of like, well, maybe this model stuff.
52:09 And then now we're in the time of LLMs and the world has gone crazy.
52:13 Absolutely.
52:14 I said we were going to close out with a continual AI, but let me just ask, what are your thoughts
52:18 on LLMs, where this stuff's going?
52:20 I mean, we all have exposure to it in different ways, but you've got this understanding of
52:26 what they're trying to model it on quite a bit.
52:28 So what do you guys in your space think of it?
52:31 So the first lab I joined was, well, the first real lab I joined was a neuroscience lab where
52:36 we were sticking wires in brains and actually doing real neuroscience.
52:40 But I also started kind of simultaneously working with a cognitive scientist where we
52:45 were working on the original word to them.
52:48 So this is in my mind, like the grandfather of the LLM.
52:52 So this is the model that like overnight took Google Translate from being meh to pretty
52:56 good.
52:57 And it's just really at the heart of it, just an auto encoder.
53:00 But we were really excited then doing kind of semantic modeling of how much further kind
53:06 of deep learning could take language modeling.
53:09 And then we were actually using it to study catastrophic forgetting.
53:11 So does word to that catastrophically forget?
53:14 The punchline is it does.
53:15 And that kind of got me jumped into really excited about continual learning and so on.
53:20 So I saw kind of that trajectory then, and then kind of stepped out of it for a few years
53:25 and dug more deep into pure neuroscience and artificial intelligence from other angles.
53:32 But I'd always been just so fascinated by this idea of like, say you take an AI and
53:37 it could read every book or it could read the Internet.
53:39 Like what would you be able to get?
53:42 And that was kind of, you know, in my mind, like, well, seeing word to that, try and do
53:45 the same thing and training it on my laptop, like, well, you know, it's going to take,
53:49 it's going to take a minute till we get there.
53:51 And I underestimated that drastically.
53:55 It was like almost no progress, almost no progress.
53:57 Wow.
53:58 What just happened?
53:59 Right.
54:00 It's a testament to statistical learning more than anything, which is just how much information
54:05 can you just soak up and put together in a fancy new way and regurgitate back.
54:10 I think the next big leap is going to be adding more cognition to that progress.
54:15 So adding, when an agent has a goal, when an agent kind of has to break down a series
54:20 of steps to get to that goal, those kinds of things we don't see as much of.
54:26 And that will kind of be the next big push.
54:28 I think that'll kind of take all the cool things that LLMs can do now and kind of blow
54:33 everyone away of just, if you can take the internet and put it on a seven gigabyte file,
54:38 what can that get you?
54:39 But what if you can take the internet, put it on a seven gigabyte file, but actually
54:42 have some sort of logic and direction and the agent itself can actually navigate through
54:47 its own thoughts.
54:49 That's going to take us, I think, right to the borderline of.
54:52 Terminator?
54:53 Yeah.
54:54 Not Terminator.
54:55 It won't be Terminator.
54:56 No, it won't be Terminator, but it'll be a really an intelligence.
54:59 It's going to be super interesting.
55:00 You know, you've got all this prompt engineering and these clever ways to kind of get the current
55:06 LLMs in the right mindset, which is probably a personification, but you can tell it things
55:12 like here, I want you to tell me how to do this.
55:14 And I'll come up with some answer.
55:15 You can say, I want you to think step by step.
55:18 And then all of a sudden you get a real different type of answer where it pulls out the pieces
55:22 and it thinks about how it does it.
55:23 And it's going to be interesting.
55:24 You know, it's kind of like kind of stuff, but it just, it already knows how to think.
55:29 You don't have to give it little weird clues.
55:31 Like you're an expert in this and you're really good at it.
55:33 Now I want to ask you a question about, Oh, I'm good at it.
55:35 Okay.
55:36 I'll answer better.
55:37 My favorite definition of AI is it's whatever computers can't do yet.
55:40 Yeah.
55:41 Because like, you know, 30 years ago, if we had this conversation, it'd be like, so what
55:44 do you think of deep blue?
55:45 Do you think deep blue, the AI that has to roll the chess can, you know, think, and is
55:51 it going to take all our jobs or it's going to be, what can Watson do?
55:54 Can it think and take all our jobs?
55:56 And you know, that was solved with tree search, which, you know, undergraduates in their second
56:01 year CS class are learning.
56:03 It's just a standard it's search.
56:05 People don't even think of searches.
56:06 What if we just loaded every possible outcome into the chest thing and the steps.
56:11 And we just try to take the traverse each step and see where it takes us.
56:14 Right.
56:15 There's always going to be a room to grow, but you know, from a cognitive science perspective,
56:18 I think something far more interesting rather, you know, I think it's cool to see what computers
56:22 can actually do.
56:23 And I think they can do a hell of a lot more.
56:25 But I think it's more interesting to kind of ask the more philosophical question of
56:28 what does that actually mean for us?
56:30 Because every step of the way, when we develop something new in artificial intelligence,
56:34 it tells us something a lot deeper about our own intelligence too, where back in the fifties
56:39 and sixties, they thought the chess meant intelligence.
56:42 And so now we're kind of seeing with LLMs, if someone just reads a bunch of books and
56:46 can memorize a bunch of books, does that mean they're intelligent?
56:49 Because that's effectively what a, what an LLM can do.
56:51 It comes across as intelligent, right?
56:54 It comes across that way to people like, oh, you have all the answers, but it doesn't mean
56:57 you were good at problem solving necessarily.
56:59 We're kind of peeling at this onion and we're kind of segmenting intelligence into its different
57:04 categories to really kind of break it apart.
57:07 Just from this vague word of intelligence into actually what are the parts that make
57:12 something intelligent?
57:13 What does it really mean?
57:14 And what, what are still the things that like we thought were hard and are really easy or
57:18 the things that we thought were easy and really hard.
57:20 Because any LLM you have now can't chew gum and walk to the store and buy you something
57:26 to eat and then play you in chess.
57:28 And so the general and general AI isn't just there for, to make it sound grander.
57:34 It's there because that's actually something that makes humans very, very unique is that
57:38 I can have this conversation with you, write code, prepare an omelet, walk a dog and do
57:43 all of these things all at once.
57:45 And I started as a baby.
57:47 And have feelings and thoughts and introspection about it and all that.
57:50 - Hopes, dreams, like ketchup and not tomatoes and all sorts of other.
57:54 - Yeah, super interesting.
57:55 So are you, are you pretty positive on where stuff's going?
57:58 - I'm less concerned about the AI itself and I'm more concerned about just how we react
58:02 to it.
58:03 And if we act, react intelligent, we react well.
58:06 We went through the industrial revolution.
58:08 We've went through the steel age.
58:10 We're starting to go through the cognitive revolution or whatever people are going to
58:14 call this a hundred years ago.
58:15 I think by and large, people are going to be okay.
58:18 I think we just need to have good policies, make sure that people do it responsibly, people
58:22 do it well.
58:23 And that's going to be the hard part is just how do we manage the transition?
58:26 Well, how do we take a whole labor force whose jobs will be gone?
58:31 Like there's no economic incentive to keep their jobs and retrain them in a really smart
58:36 and sensible way.
58:38 So people's livelihoods don't just go away to maximize the dollar.
58:42 Those are kind of problems that are going to need good policy and clever solutions and
58:47 a lot of research to actually sit down and handle well.
58:50 But AI isn't going to go anywhere just like computers haven't went anywhere, but it's
58:55 not terminated that we should worry about.
58:57 - No, absolutely.
58:58 It's certainly not.
58:59 I was joking.
59:00 We should worry about disinformation and politics and news and all that stuff.
59:05 - I think people get so hung up on all the negatives that we kind of forget, like we're
59:08 developing these tools for a reason.
59:11 Scientists are spending so much time and we're excited about them for a reason.
59:13 One being that I used to work on proteins, trying to find the structure of proteins.
59:18 And that was one protein a year.
59:20 It would take one year to find the structure of protein.
59:22 And now I can run this.
59:23 I took the same sequence that we found and ran it through AlphaFold, which is DeepMind's
59:27 protein folding software.
59:29 They could do it in five minutes.
59:31 I mean, those are the things that are really like catalyze science and technology and healthcare
59:36 and actually solve the problems we want to solve.
59:39 So that's what I think we should do.
59:40 - I think I heard even some AI stuff coming up with some new battery chemistry, potentially.
59:44 - Everything.
59:45 How do you appropriately kind of harness like a fusion ring and like a tokamak reactor or
59:52 something like that.
59:54 Everything even to, how do you route a PCB, like millions of traces effect.
01:00:01 So it's a tool for the mind and it's not going to put our minds out of work.
01:00:05 Like you said a long time ago when you were in your eye track.
01:00:07 - Yeah, exactly.
01:00:08 It's like a different view of that thing.
01:00:09 It's like the same reason that none of us want to sit around and churn butter and go
01:00:13 till the fields and walk to work.
01:00:17 - I don't dislike my clothes washer.
01:00:20 Not at all.
01:00:21 Yeah.
01:00:22 Awesome.
01:00:23 All right.
01:00:24 Well, thanks for giving us a look inside your lab, inside your research and the field, and
01:00:27 then just, you know, share what you've gotten up to.
01:00:29 It's been great.
01:00:30 - Yeah, yeah, yeah.
01:00:31 Thanks for having me.
01:00:32 Yeah.
01:00:33 It's been really great to, at least for the years that I've been in research, seeing how
01:00:37 pivotal Python has been.
01:00:38 And I think that's one of the big things that probably goes unnoticed by a lot of developers
01:00:43 when they're actually writing their code to solve whatever problem they're solving is
01:00:46 that someone who's wrote the NumPy library or the scikit-learn library, or, you know,
01:00:52 Jupyter notebooks has actively played a part in curing diseases and making people's lives
01:00:56 better.
01:00:57 And those stories usually don't kind of come to the forefront.
01:01:00 - It's not a direct line, right?
01:01:02 But the research was done and facilitated by these open source tools and then discoveries
01:01:06 were made.
01:01:07 And that's the difference between people getting cured in 10 years or getting cured in a year.
01:01:11 And that's thousands of lives.
01:01:13 And so you might not think about it when you're sitting behind your desk, you know, making
01:01:17 something usable.
01:01:18 And, but for every day, a PhD student doesn't have to like pull their hair out and debug
01:01:22 some software when it's well-written and it works well.
01:01:25 That's a day that they can find something new.
01:01:27 - That's awesome.
01:01:28 Yeah.
01:01:29 Very inspiring.
01:01:30 Let's leave it there, Keelen.
01:01:31 Thank you for being on the show.
01:01:32 - Absolutely.
01:01:33 - Yeah.
01:01:34 It's been great to talk to you.
01:01:35 - Thank you so much.
01:01:36 - Yeah.
01:01:37 - And thanks for joining Talk Python To Me.
01:01:39 Thank you to our sponsors.
01:01:40 Be sure to check out what they're offering.
01:01:41 It really helps support the show.
01:01:44 It's time to stop asking relational databases to do more than they were made for and simplify
01:01:49 complex data models with graphs.
01:01:52 Check out the sample FastAPI project and see what Neo4j, a native graph database can
01:01:57 do for you.
01:01:58 Find out more at talkpython.fm/neo4j.
01:02:03 This episode is sponsored by Posit Connect from the makers of Shiny.
01:02:07 Publish, share, and deploy all of your data projects that you're creating using Python.
01:02:12 Streamlit, Dash, Shiny, Bokeh, FastAPI, Flask, Quarto, Reports, Dashboards, and APIs.
01:02:19 Posit Connect supports all of them.
01:02:20 Try Posit Connect for free by going to talkpython.fm/posit.
01:02:23 P-O-S-I-T.
01:02:24 Want to level up your Python?
01:02:28 We have one of the largest catalogs of Python video courses over at Talk Python.
01:02:32 Our content ranges from true beginners to deeply advanced topics like memory and async.
01:02:37 And best of all, there's not a subscription in sight.
01:02:40 Check it out for yourself at training.talkpython.fm.
01:02:43 Be sure to subscribe to the show, open your favorite podcast app, and search for Python.
01:02:48 We should be right at the top.
01:02:49 You can also find the iTunes feed at /iTunes, the Google Play feed at /play, and the Direct
01:02:55 RSS feed at /rss on talkpython.fm.
01:02:59 We're live streaming most of our recordings these days.
01:03:01 If you want to be part of the show and have your comments featured on the air, be sure
01:03:05 to subscribe to our YouTube channel at talkpython.fm/youtube.
01:03:10 This is your host, Michael Kennedy.
01:03:11 Thanks so much for listening.
01:03:12 I really appreciate it.
01:03:14 Now get out there and write some Python code.
01:03:16 [MUSIC PLAYING]
01:03:19 [MUSIC ENDS]
01:03:22 [MUSIC PLAYING]
01:03:25 [MUSIC ENDS]
01:03:28 [MUSIC PLAYING]
01:03:31 [MUSIC ENDS]
01:03:34 ♪♪