Monitor performance issues & errors in your code

#461: Python in Neuroscience and Academic Labs Transcript

Recorded on Thursday, Mar 14, 2024.

00:00 Do you use Python in an academic setting?

00:02 Maybe you run a research lab or teach courses using Python.

00:05 Maybe you're even a student using Python.

00:08 Whichever it is, you'll find a ton of great advice in this episode.

00:11 I talked with Keiland Cooper about how he's using Python in his neuroscience lab at the

00:16 University of California, Irvine.

00:19 And Keelen wanted me to let you know that if any developers who are not themselves scientists

00:23 are interested in learning more about scientific research and ways you might be able to contribute,

00:28 please don't hesitate to reach out to him.

00:30 This is Talk Python to Me, episode 461, recorded March 14th, 2024.

00:35 Are you ready for your host, please?

00:39 You're listening to Michael Kennedy on Talk Python to Me.

00:42 Live from Portland, Oregon, and this segment was made with Python.

00:50 Welcome to Talk Python to Me, a weekly podcast on Python.

00:54 This is your host, Michael Kennedy.

00:55 Follow me on Mastodon, where I'm @mkennedy, and follow the podcast using @talkpython,

01:00 both on fosstodon.org.

01:03 Keep up with the show and listen to over seven years of past episodes at talkpython.fm.

01:08 We've started streaming most of our episodes live on YouTube.

01:12 Subscribe to our YouTube channel over at talkpython.fm/youtube to get notified about upcoming shows and be

01:18 part of that episode.

01:20 This episode is sponsored by Neo4j.

01:22 It's time to stop asking relational databases to do more than they were made for and simplify

01:28 complex data models with graphs.

01:31 Check out the sample FastAPI project and see what Neo4j, a native graph database, can do

01:36 for you.

01:37 Find out more at talkpython.fm/neo4j.

01:42 And it's brought to you by Posit Connect from the makers of Shiny.

01:46 Publish, share, and deploy all of your data projects that you're creating using Python.

01:50 Streamlet, Dash, Shiny, Bokeh, FastAPI, Flask, Quarto, Reports, Dashboards, and APIs.

01:57 Posit Connect supports all of them.

01:59 Try Posit Connect for free by going to talkpython.fm/posit.

02:03 P-O-S-I-T.

02:04 - Hello, how are you?

02:06 - I'm doing well.

02:07 So awesome to have you here on Talk Python and talking academics.

02:13 I didn't tell you before we hit record, but I spent a long time at universities and I

02:17 just love them.

02:18 They're such cool places and it's gonna be really fun to get a look inside how Python's

02:23 being used there.

02:24 - Yeah, well, yeah.

02:25 Thank you so much for having me.

02:26 And yes, I too love universities.

02:28 It's kind of like all the coolest parts of humanity just kind of intermixing in one place.

02:33 So yeah, I'd love to kind of peel back the curtain on how things are going.

02:37 - Yeah, yeah.

02:38 Well, we talked about how you and your colleagues use Python and data science inside of your

02:43 neurology research lab.

02:46 But before we dive into that, let's just get a bit of background on yourself.

02:49 Who are you?

02:50 How do you get into Python?

02:51 All those things.

02:52 - So I'm Keiland Cooper.

02:53 I'm a neuroscientist at the University of California, Irvine.

02:56 So Southern California, 15 minutes from the beach and an hour from the mountains.

03:01 But I'm originally from the middle of nowhere, Indiana.

03:05 And I started playing with computers and code when I was young.

03:09 So like middle school-ish, just ripping apart computers and seeing what was in them and

03:13 then trying to put them back together and feeling bad when they didn't work right after.

03:18 And then the typical, you know, tweaking the software when you don't like what it does

03:22 until you make it work.

03:23 And then probably my senior year of high school is when I started teaching myself Python.

03:29 And it was because we had to do some for some government class actually.

03:32 - Oh, wow.

03:33 Okay.

03:34 - And we had to learn, we were learning about the stock market.

03:36 And every day you'd have to spend like 15 minutes going to like some stock website and

03:40 like filling out your fake stocks.

03:42 And so I wrote a really small Python script that would just pull the data from the website

03:47 and populate like an Excel spreadsheet.

03:49 And so every day the kids in the class were just like going through and like spending

03:52 15, 20 minutes by hand writing it down.

03:55 And I would just sit there.

03:56 - That's awesome.

03:57 - And so that was kind of the first time I was like, wow, this whole automation thing

04:00 is pretty sweet.

04:01 From there, Python just kind of, I caught the bug pretty early.

04:05 Python was definitely the way to go.

04:07 - Yeah.

04:08 Was Python your first programming language?

04:09 - My first programming language was the Windows registry and trying to undo all of the mistakes

04:14 of the operating system.

04:15 - It's been a while since I've been in the Windows registry, but good old regedit.

04:20 - I switched to Linux pretty quick.

04:22 In Linux and Unix.

04:23 - Are you still on the Linux?

04:24 - Mostly.

04:25 My desktops are all Linux.

04:26 My servers are obviously all Linux.

04:27 I like Mac for a laptop just because, you know, Linux has this thing where you tinker

04:32 with it.

04:33 And so then any small task you want to do, you end up like rewriting some deep script

04:37 in the operating system and like two hours later, you're like, what was that small thing

04:40 I was trying to do again?

04:41 - Yeah, exactly.

04:42 I got distracted.

04:43 I was rewriting something in there and off we go.

04:46 Yeah.

04:47 - Yeah.

04:48 So Macs are nice because you still have all the same Unix like properties that are great,

04:51 but you pay a price for reliability.

04:53 - You just sell a bit of your soul out, but boy, is that UI nice.

04:56 And those little menu bar apps are handy.

04:59 You know, I was, that's right.

05:02 I was been playing with running Ubuntu on my Mac M2 Pro, which it runs great, but it's

05:10 an ARM version of Mac or of Linux rather.

05:14 Well, both really, but boy, is there a limited supply of applications for an ARM Linux distribution?

05:21 Let me tell you, they're like, just download the Debian package.

05:25 - I imagine that'll change pretty quick though.

05:27 - Yeah, they're like, just download the Debian package.

05:28 You just install it.

05:29 Like wrong platform.

05:30 I'm like, again, over half.

05:33 But yeah, I think it will change as I think ARM's going to start to take over Windows

05:37 a little bit as well.

05:38 And obviously the Mac world is basically transitioning.

05:41 So anyone who has a Mac, yeah.

05:43 - I think it's Qualcomm that just kind of started hinting that they were going to try

05:48 and really heavily compete with the M line of processors and have some pretty good specs.

05:52 So it'll be good.

05:53 I have an M3 and it's pretty nice.

05:56 - It is really nice.

05:58 Like I said, I'd like to run more stuff on it, but it's still kind of Intel or x86 stuff

06:02 for Linux and Windows.

06:04 So it's a little hard to work with that, but still super fun.

06:08 That's a long way to say it's a very long time since I've been a reg at it.

06:12 Personally, it sounds like you as well being not naming Windows too much.

06:16 - I'm so bad at Windows now.

06:18 Like when I'm helping people with Python or something else and they show me their computer,

06:21 there's always that like 10 minute learning curve of like, okay, how do I do anything

06:27 basic on this machine?

06:29 Or even like the keyboard shortcuts you get so accustomed to when people don't have any

06:33 of those little things that you're just like, how do I select everything?

06:37 - That is it.

06:38 Like I did professional software development on Windows for a long, long time.

06:41 I even wrote a bunch of Windows apps.

06:43 It was great.

06:44 But going back and forth too quickly that or Linux, like just the hotkeys, I just get

06:49 broken.

06:50 So I'm just on Windows also when I come back to Mac, like I'm just completely out of sorts.

06:54 So yeah, it's fun.

06:55 All right, well, let's talk academics and Python from a probably a OS, mostly agnostic

07:04 perspective.

07:05 But yeah, just give us a sense of the kind of research you do.

07:09 You know, what is your field?

07:10 What do you study?

07:11 Those kinds of things.

07:12 So people get a sense of like, why am I talking about this?

07:14 Where are you coming from as you talk about doing all these things?

07:16 - The core of my work is pure neuroscience.

07:20 So basic science, what we do mainly in the lab is we take really tiny wires.

07:26 So they're like a fifth of the size of the human hair.

07:28 And now we're using something called silicon probes, which are, they're manufactured the

07:31 same way that computer chips are manufactured on silicon wafers using photolithography.

07:37 - Do you get a higher density that way?

07:38 Do you get like a bunch of little sensors on one point or something?

07:42 Okay.

07:43 - So we used to build these little drives.

07:44 I used to have one here, but I got rid of it.

07:46 Little drives by hand.

07:47 So you would just feed the wires in with forceps.

07:51 And so you'd get maybe 64 or 128 at most, depending on how much time you want to sit

07:56 there and feed the wires in.

07:58 But now you can just get the manufactured, you pay a lot more, but you get twice, three

08:02 times the sites.

08:04 And the whole point is the more sites you have, the more neurons you can actually record

08:08 from in the brain.

08:09 - Yeah.

08:10 You're not just saying this part of the brain lit up, but you can have a much better picture,

08:14 right?

08:15 - Yeah.

08:16 So a big part of your brain is the neuron.

08:17 And so of the millions and billions of neurons, depending on the species you're recording

08:22 from, we can record maybe a few hundred of them, but that's usually sufficient to actually,

08:27 in the specific region you study, and I can talk about that more, to discern some sort

08:32 of information from it.

08:33 So really the data type we really care about is this tiny little electrical voltages that

08:38 tell you what different neurons in the brain are talking about.

08:41 And so you put the wires in, you record the conversations of a bunch of neurons.

08:46 And particularly we're interested in two brain regions that are critical for memory, learning,

08:52 and decision-making.

08:53 And this is the hippocampus, which in humans is about the size of your pinky and a few

08:58 inches in from your ear, and the prefrontal cortex, which most people know about right

09:02 behind your forehead, important for learning, decision-making, and all those sorts of things.

09:07 So that's the core of my work is I'm in the lab doing the actual data collection and building

09:11 equipment to actually do that.

09:13 But once you have all of that data and the data keeps growing, like most other fields,

09:17 you got to do a lot of pre-processing, which takes Python.

09:21 You got to do a lot of post-processing, which takes a lot of Python.

09:24 And also we do something called neural decoding.

09:27 So not only do we just like say descriptively, what are these neurons doing, but we can go

09:32 one step further and say, what actual information are these cells representing?

09:38 So in the brain, we can kind of say, this is kind of the fundamental kind of information

09:44 transfer and how information is manipulated in the brain and how it ships information

09:49 from the environment into memory and how it uses that to make a decision.

09:53 All of those kinds of things we can use through fancy modeling and statistics and more recently,

09:59 deep learning and those sorts of things.

10:00 - We'll have to come back to deep learning later.

10:02 That'll be fun.

10:03 Well, given your background.

10:06 So for this hardware, do you write the software that actually talks directly to the hardware

10:11 or is there something that just records it and you grab some sort of custom file format

10:16 and run with it?

10:17 - Yeah, more recently, it kind of depends on the lab.

10:21 So as time goes on, there's more and more companies that you can just buy off the shelf

10:25 and recording platforms, mostly for the electrical engineering people.

10:30 It's kind of like an audio like amplifier, 'cause you're recording at millivolts in the

10:33 brain so you have to amplify it, write it to, if you're plugged in with a wire, write

10:37 it to the computer.

10:39 So all that takes software in various forms.

10:42 And then we do a lot of animal research.

10:44 So the tasks that the animals do are pretty much all automated.

10:49 But recently in the lab, we've kind of had this resurgence of like developing kind of

10:54 novel hardware and a lot of automation of behavior.

10:57 So I've kind of rewritten most of our entire behavioral stacks, which is a lot of just,

11:03 so microcontroller programming, which not a lot of that's in Python, a lot of that's

11:06 just kind of like C++ and those sorts of things.

11:09 But we have cameras all over, so I wrote this kind of like camera server that streams all

11:14 of the camera footage from a bunch of automated boxes to some like central server that just

11:19 collects all of that data.

11:21 So yeah, a lot of the behavioral stuff nowadays, we're just building in house to collect all

11:26 of the behavior data.

11:27 The EFIS stuff is now, especially because we're doing something called wireless recording.

11:32 So instead of just having a wire plugged into the head, it just writes it to like an SD

11:36 card or Bluetooth.

11:38 That's just kind of all on chip.

11:40 So it's just whatever the microcontroller language of the chip needs.

11:46 - This portion of Talk Python to Me is brought to you by Neo4j.

11:50 Do you know Neo4j?

11:51 Neo4j is a native graph database.

11:54 And if the slowest part of your data access patterns involves computing relationships,

11:59 why not use a database that stores those relationships directly in the database, unlike your typical

12:05 relational one?

12:06 A graph database lets you model the data the way it looks in the real world, instead of

12:10 forcing it into rows and columns.

12:13 It's time to stop asking a relational database to do more than they were made for and simplify

12:18 complex data models with graphs.

12:21 If you haven't used a graph database before, you might be wondering about common use cases.

12:25 What's it for?

12:27 Here are just a few, detecting fraud, enhancing AI, managing supply chains, gaining a 360

12:34 degree view of your data, and anywhere else you have highly connected data.

12:38 To use Neo4j from Python, it's a simple pip install Neo4j.

12:44 And to help you get started, their docs include a sample web app demonstrating how to use

12:48 it both from Flask and FastAPI.

12:51 Find it in their docs or search GitHub for Neo4j movies application quick start.

12:56 Developers are solving some of the world's biggest problems with graphs.

13:00 Now it's your turn.

13:01 Visit talkpython.fm/neo4j to get started.

13:06 That's talkpython.fm/neo, the number four and the letter J.

13:11 Thank you to Neo4j for supporting Talk Python to me.

13:15 - I think it's surprising how much software and even hardware, but definitely software

13:21 is involved for something that doesn't sound like a software discipline.

13:25 You wouldn't think of what you guys are doing as inherently almost like a software team,

13:30 but there's a lot of software there.

13:31 - Absolutely, and it's growing.

13:33 So it used to be 10, 20 years ago, more biology, I'd say, like more wet lab stuff.

13:38 But 90% of what I do as kind of a neurobiologist is really just engineering style things.

13:45 Like I'm more recently designing PCBs and I'm in the shop a lot, just like with saws

13:50 and hammers and drills and like actually physically building things.

13:54 And obviously a lot of code.

13:56 And the coding part is becoming bigger and bigger to the point where in the field, I

14:00 always say that the neuroscience is like about three decades behind astrophysics.

14:05 'Cause all the problems that like neuroscientists like say we're facing now as a field, they

14:09 had like three decades prior where in astrophysics, they're like, well, or in neuroscience, we're

14:14 like, what do we do with all this data?

14:16 This is, I mean, I'm collecting a hundred gigabytes an hour, if not more, like what

14:20 do we do with all this data?

14:21 - That is a lot.

14:22 - Yeah, but relative to like some of those big telescopes that are collecting like almost

14:27 petabyte scale.

14:28 - I would say both ends of physics, like the very, very extreme ends of physics.

14:32 So astrophysics, the very large and then particle physics, right?

14:36 At CERN as well, they've got insane amounts of data.

14:39 Yeah.

14:40 - And that's what we're starting to see, I think, in neuroscience too, is that kind of

14:42 division of like, because the scale of data collection is so big, you're starting to need

14:47 not just a single lab, but teams.

14:49 So we have a few institutes now that are just pumping out terabytes of data.

14:54 And so you start to see that division between the neuroscientists who are really in the

14:58 lab hands-on with actual neural tissue or the recording device, and the neuroscientists

15:04 who are just take the data and analyze it and develop new models and statistical models.

15:09 And also theory, there's always a dearth of theory in neuroscience, but the computational

15:14 modeling is certainly getting a lot bigger within the last few decades as well, where

15:19 people's entire job is just how do we model some of these things in code?

15:24 - You probably run into different research groups, different teams that have different

15:29 levels of sophistication from a software side.

15:32 And do you see like a productivity or a quality difference jump out from like the kind of

15:37 work or the velocity of work that people are doing there?

15:40 - Absolutely.

15:41 It makes me almost like a, it's a huge range.

15:44 There are like very sophisticated labs.

15:46 And usually those are the labs that have kind of just a pure software person on the team

15:51 or people who are very inclined towards software all the way to, and it makes me so sad when

15:55 people are spending like weeks in a spreadsheet, just manually doing things by hand.

16:00 - Yeah, I know.

16:01 - You could do this in five minutes in Python.

16:03 - And not only could you do it faster, you could do it without any errors.

16:07 - Yeah, more reliable.

16:08 - None of those like, oh, I misread it and I shifted off by a cell or I typed in.

16:14 - I missed something, right?

16:16 'Cause it just reads what's there, yeah.

16:18 - A lot of like graduate programs are starting to wake up to this fact that it's gonna be

16:23 almost impossible to do any science without some degree of proficiency in coding.

16:28 And I think a lot of, say grad students and postdocs and so on, when they actually sit

16:33 down and try and analyze their data, whether there's just an Excel or they need to write

16:38 a little Python script, that's kind of their first introduction is, oh, I have this data,

16:42 I need to do something with it.

16:44 I'm gonna Google exactly how do I read in this data or how do I do a T-test in Python

16:49 or how do I plot something in Matplotlib?

16:51 And that's kind of the level that they start getting into out of necessity.

16:55 But the sophistication and the speed, because they're usually just teaching themselves,

17:00 that's most of academia.

17:01 It's just, you have a problem, spend a few days Googling and reading books until you

17:05 find it.

17:06 - And once it works, you can kind of just leave it.

17:08 You don't have to clean it up or anything, right?

17:10 Yeah, okay.

17:11 - And it results in a lot of, I mean, the progress of science doesn't go away, but the

17:16 code is not robust.

17:19 And so that's why you see things, especially in other fields of psychology and such, like

17:22 replication crises and people have done meta-analysis of running the same software stack on 12 different

17:29 data sets and you get different results.

17:31 And so you start to kind of see the shaky foundation starting to bleed into the, like

17:37 you said, the reliability results.

17:40 - And you'd have consequences, not just it's more work or something.

17:43 - Exactly.

17:44 - Maybe we could start a bit by just talking about maybe the history, diving into this

17:48 a little bit more, just the history of programming in neuroscience.

17:52 I wasn't in neuroscience in any way, but I worked with a bunch of cognitive scientists

17:57 studying how people solve problems and thought about things at a lab for one of my first

18:03 jobs and we studied all through eye tracking.

18:05 Not the iPhone, but actual eyes.

18:07 It was fascinating.

18:08 It was tons of data.

18:09 And there were, like you described, a lot of people who would do sort of Excel stuff

18:12 and they would take the data and they process it.

18:15 Over time, we just started to automate these things and their first thought was, "You're

18:19 programming me out of a job." I'm like, "No, no, no.

18:22 This is the crappy part of your job.

18:24 Like you're supposed to analyze the results and think about it and plan new stuff and

18:28 now you can just focus on that." And as the software got better, we just tackled bigger problems.

18:34 So maybe give us a bit of a history of on your side.

18:37 So I love the cognitive science.

18:39 That's my more background is cognitive science.

18:42 I was an undergrad and grew up in science in a cognitive science department while also

18:47 doing some wet lab neuroscience stuff.

18:49 So it's fun.

18:50 Yeah, absolutely.

18:51 Did you start out with like MATLAB and that kind of stuff?

18:54 Is that where they told you you need to be?

18:56 Neuroscience has certainly had, at least our branch of neuroscience, just because by the

19:01 nature of recording voltages and you need to write to a computer.

19:05 So there has been kind of a long history of, for as long as there's been even like punchcard

19:11 computers, people have kind of read in the data into the computer and done their statistics

19:17 on that rather than something else.

19:20 I'm actually recently writing a kind of a review article on the history of data science

19:24 and neuroscience.

19:26 And I loved this paper.

19:27 It was from 1938.

19:31 And they took an EEG spectrum, and so EEG is just the continuous time series of brain

19:37 voltage.

19:38 So you're not in the brain recording.

19:39 And this is, I think, from humans.

19:41 And they took something called the Fourier transform, which I'll be as up to speed with

19:45 that is you basically just take some oscillating signal and you break it down into its constituent

19:50 parts.

19:51 And most of you have seen it before if you've ever seen like an audio spectrogram, that's

19:54 kind of the most notable visualization where you can kind of see the high frequencies and

19:59 the low frequencies.

20:00 So basically it pulls the frequencies out of the signal, right?

20:04 Yeah.

20:05 But the way they did this, and this is 1938, there's no computers.

20:08 So they actually had mechanical device where they would just take this EEG trace that was

20:13 on tape and they would feed it into this like mechanical machine.

20:17 And it would basically read kind of this black line on the tape.

20:21 And so as it would crank the tape around this machine, depending on kind of the frequency

20:27 that the line went up and down, that would read out the Fourier transform.

20:30 So it was mechanical.

20:31 Wow.

20:32 Like a lot of those cool devices back in the older days.

20:35 That's impressive.

20:36 Now you can get that same thing with in MATLAB.

20:39 You just type FFT, parenthesis, parenthesis, put your data in the middle and you get the

20:43 same thing in microseconds.

20:45 But neuroscience, at least my field has kind of always had this kind of serendipitous relationship

20:51 with computing generally, coding generally.

20:53 And a lot of the code I think earlier on was kind of Fortran-ish and then it moved towards

20:59 MATLAB.

21:00 And MATLAB's kind of had its stake in the ground for a long time, just because that

21:03 was the first kind of software that you could really do array manipulations on well.

21:08 And it was kind of a higher level than some of the lower level programming.

21:12 So a lot of the older labs have their entire code base and stack and analysis software

21:18 in MATLAB.

21:19 And so it's only been within maybe five to six, seven years, maybe a bit longer, 10 years

21:25 that you've really seen Python start to supplant MATLAB as kind of the de facto programming

21:30 language in labs, just because of the cost of trying to transfer everything over.

21:35 And despite the fact that MATLAB isn't open source and it's extremely expensive, most

21:40 universities have licenses.

21:41 And so that kind of facilitates.

21:43 It's prepaid in a sense.

21:44 Yeah.

21:45 But it is still pretty expensive.

21:46 Especially if you get those little toolboxes like Wavelet Decomposition Toolbox, 2000

21:52 bucks instead of a pip install, you know.

21:54 And again, we do a lot of signal processing.

21:56 And so that's exactly the place you want to be.

21:59 And like MATLAB usually controls because it has pretty good control over like external

22:03 hardware.

22:04 You can run like your behavior, your task kind of in MATLAB mode.

22:07 So you can kind of do everything in one language as you would like to do in Python.

22:12 But it's starting to kind of go away.

22:14 And I think a lot of that is just because the allure of Python, which is so many tools

22:19 and because it's probably a lot easier to learn for most people than MATLAB.

22:24 We're kind of starting to see that switch now that there's kind of more to offer, I'd

22:28 say a lot of scientists than MATLAB.

22:30 Yeah.

22:31 You said 10, 12 years ago.

22:32 At least in our field.

22:33 Yeah.

22:34 The difference in the external packages on PyPI you can get, and especially the ones

22:39 for data science have just exploded.

22:43 The choices and the stuff that's out there is, it's pretty diverse.

22:46 It's pretty crazy.

22:47 The only other one that I think is still in pretty strong competition with Python from

22:51 the perspective of, we collaborate a lot with like mathematicians and statisticians and

22:55 R is their usual favorite, just because statistics, like all the best statistical packages are

23:02 still pretty much in R. And so that's where a lot of people live.

23:05 ggplot is pretty good.

23:06 Pretty makes pretty good plots.

23:07 Yeah, that's interesting.

23:08 It's really focused and it's really good at what it does.

23:13 And one of the things that I think is worth just considering, if somebody comes, let's

23:18 say a brand new first year grad student comes into the lab and you're like, all right, what's

23:23 your programming experience?

23:24 Like, well, programmed the clock on the VCR.

23:27 Like, okay, we're going to have to start you somewhere or something.

23:30 And you could teach them something really specific like MATLAB or something along those

23:35 lines.

23:36 But if they learn something like Julia, not like Julia, like Python, not Julia, maybe

23:41 even not really R, but R is closer somewhat, is they learn not just a skill for the lab,

23:48 but it's kind of almost any software job is potentially within reach with a little bit

23:53 of learning about that area.

23:55 Right.

23:56 Like if you know Python, you say, I want a job.

23:58 There's a massive set of options out there.

24:00 If you say, I know MATLAB or even Julia, it's like, okay, well, here's the few labs and

24:07 the few research areas and the real, I just think it's something that.

24:11 A lot of engineering firms.

24:12 Yeah.

24:14 A lot of academic folks should consider what happens if the student doesn't necessarily

24:19 become a professor.

24:20 You know what I mean?

24:21 Which actually is a lot of the time, right?

24:24 Or a professional researcher of some sort.

24:26 And that's a really awesome skill to have on top of your degree.

24:29 So I think that's just a big win for it.

24:31 I'm happy to see that.

24:32 And that literal situation just happened where a statistician that we were collaborating

24:36 pretty closely with graduated, brilliant guy, and got a job at Microsoft.

24:42 And so we were in a meeting after he was there and they were like, what's some advice that

24:47 you have now that you've been in industry for a while?

24:50 And he's like, stop using R, learn Python because everyone here uses Python.

24:55 And it took me a few months to kind of switch from the R worldview of, you know, caret hyphen

25:01 to equal sign to actually to work and collaborate with everyone because everyone's just using

25:07 Python.

25:08 Yeah.

25:09 From his perspective.

25:10 And I'm sure it's not unique.

25:11 No, I'm sure that it's not either.

25:12 I think, yeah, I just think it's not like a religious war.

25:15 It's not like, oh, I think Python is absolutely better.

25:17 You should just not use other stuff.

25:18 I just think it's preparing people for stuff beyond school.

25:22 It's a pretty interesting angle to take.

25:24 And it's not like you can't learn other things.

25:25 I think it's really good to learn other things, especially ones that are complimentary, where

25:30 R can be complimentary, especially now that they have a lot of the subsystem packages.

25:34 When I get R code, I usually just write like a sub process.

25:39 I did this recently.

25:40 I just wrote like a sub process line to call the R script because I was too lazy to rewrite

25:45 it.

25:46 Yeah, sure.

25:47 But there's other like, I don't know, like Rust is probably a good one to probably try

25:49 and brainchallenge.

25:50 Or like lower level languages like C++.

25:53 If you need for what you're doing.

25:54 Yeah.

25:55 It sounds like you guys do for talking to hardware and stuff like that.

25:58 Yeah.

25:59 Occasionally.

26:00 Boy, there's not a lot of things Python can't do.

26:01 Break out some MicroPython when you got to get your microcontrollers and stuff.

26:06 This portion of Talk Python to Me is brought to you by Posit, the makers of Shiny.

26:10 Formerly RStudio.

26:11 And especially Shiny for Python.

26:15 Let me ask you a question.

26:16 Are you building awesome things?

26:18 Of course you are.

26:19 You're a developer or data scientist.

26:20 That's what we do.

26:21 And you should check out Posit Connect.

26:24 Posit Connect is a way for you to publish, share and deploy all the data products that

26:28 you're building using Python.

26:31 People ask me the same question all the time.

26:33 Michael, I have some cool data science project or notebook that I built.

26:37 How do I share it with my users, stakeholders, teammates?

26:39 Do I need to learn FastAPI or Flask or maybe Vue or ReactJS?

26:45 Hold on now.

26:46 Those are cool technologies and I'm sure you'd benefit from them, but maybe stay focused

26:49 on the data project.

26:51 Let Posit Connect handle that side of things.

26:53 With Posit Connect, you can rapidly and securely deploy the things you build in Python.

26:58 Streamlet, Dash, Shiny, Bokeh, FastAPI, Flask, Quarto, Ports, Dashboards and APIs.

27:04 Posit Connect supports all of them.

27:07 And Posit Connect comes with all the bells and whistles to satisfy IT and other enterprise

27:11 requirements.

27:12 Make deployment the easiest step in your workflow with Posit Connect.

27:17 For a limited time, you can try Posit Connect for free for three months by going to talkpython.fm/posit.

27:23 That's talkpython.fm/POSIT.

27:26 The link is in your podcast player show notes.

27:29 Thank you to the team at Posit for supporting Talk Python.

27:33 So another thing is, I don't know how it's received.

27:36 I know it took a while to kind of really catch on.

27:39 And I think the thing that just broke the final barriers for open source being adopted

27:45 in at least in business was the AI stuff and the data science stuff.

27:49 People are like, "Oh, we can't use this open source stuff.

27:51 We got to have a SLA and some company we can sue if our code doesn't work right."

27:55 Or, you know, whatever, right?

27:56 Something crazy like that.

27:58 And they're like, "But you understand all the AI and all the data science.

28:02 We have to use this open source stuff." Like, "All right, fine." What's the open source story for you guys?

28:07 Academia has probably championed open source for a really long time just because, I mean,

28:13 open source back, I mean, even when I first started, was just if you read a paper and

28:17 someone has some new fancy analysis, before it became a bigger push by like funding agencies

28:23 to like actually post it to GitHub or some repository.

28:26 I mean, you could just email people and be like, "Hey, I saw your paper.

28:29 I want that script." And they would just send you a MATLAB file.

28:32 And it would be just whatever they had written, but it was in MATLAB and you'd have to kind

28:37 of tear it apart yourself.

28:38 And there was little to no documentation.

28:40 You'd be lucky if there's comments and it's spaghetti code.

28:43 But you figured that out and you kind of work backwards and deconstruct it.

28:47 And eventually you kind of have their code.

28:49 So that kind of ethos of just scientists are really good by and large of just sharing information

28:56 and helping people out.

28:57 And if you have a question, just ask.

28:59 It's kind of always been there.

29:01 At least in our field, it's not as competitive as some other ones where you're just kind

29:05 of like racing to get the next project out.

29:08 It happens, but rarely.

29:10 But now a lot of funding agencies and just in general, people are just excited about

29:15 when you publish a paper, you put a GitHub link in the bottom of the paper and then that

29:19 links to the repository.

29:20 And yeah, maybe it's not been updated in a while, but the code's there and you can just

29:24 take it, grab it.

29:25 For the reproducibility.

29:27 How about using other things?

29:28 Is there SciPy?

29:29 I know for astronomy there's Astropy.

29:32 Is there a NeuroPy?

29:33 It's really still more analysis dependent and pre-process dependent.

29:38 So there's kind of this, it's still the early days where there's probably too many formats

29:43 just because no one can agree on what's the best one.

29:46 So like even a lot of the data formats are written kind of in Python to take whatever

29:50 data you have and reformat it to something shareable.

29:53 There's five or six of them floating around.

29:55 There's probably two that are still duking it out to see which one will be the best.

29:59 And probably five years from now, there's going to be a better one.

30:02 So data formats, certainly there's kind of this, there's a few that are neck and neck.

30:07 Analysis pipelines, a lot of those are still done in house, but there's starting to be

30:10 a lot more toolkits and frameworks and packages.

30:13 There's some really good ones that have more documentation written.

30:17 They're on the PyPI repositories.

30:18 So you can just pip install them and you have them.

30:22 The computational neuroscience people are great at this.

30:24 So all the neural simulation software, that is all really well documented, really well

30:29 written.

30:30 A lot of good example code and tutorials and so on.

30:34 So yeah, we're starting to see kind of this more robust kind of ecosystem where you can

30:40 just kind of pull things.

30:41 It still just kind of varies.

30:43 There's probably still not one go-to place other than the standard data science toolkits.

30:48 Right, right.

30:49 The pandas and so on.

30:51 Yeah.

30:52 So NumPy, Matplotlib, pandas, scikit-learn, if you're doing deep learning, PyTorch or

30:55 TensorFlow, all of those still apply to any data science stack.

31:00 Yeah, of course.

31:01 What's your day-to-day stack look like if you're sitting down to do some analysis?

31:05 I have a VS Code and a like autocomplete where I just write import in and it just NumPy,

31:11 Matplotlib, pandas.

31:12 Then I usually delete pandas because unless I have a CSV file, I'm not using it.

31:17 So NumPy, Matplotlib, I can probably do 75% of the things I want to do.

31:22 Scikit-learn and SciPy, obviously, if I'm doing any stats with those things, those libraries

31:27 I might go to.

31:28 And then over the last few years, I kind of just have my own, just because you catch yourself

31:33 writing the same functions over and over and over again.

31:36 So I just started building my kind of internal framework of just things I know I need.

31:40 So if I'm working with LFP data, I have all my filters there.

31:44 If I have spike data, I have all my spikes there.

31:47 We do a lot of decoding, so developing deep learning algorithms to decode neural data.

31:52 All of those are kind of listed there.

31:54 And then I started realizing, dude, internal tools make the difference between solving

32:00 a problem in 10 minutes or solving it in an hour where I can just sit down and have everything

32:04 automated to come up.

32:06 So yeah, the standard data science stack I use pretty frequently.

32:10 Hardware stack, I mean, so VS Code, I just recently switched to just because everyone

32:15 was talking about it from like Sublime or I usually just edit it in a terminal.

32:20 And I was like, yeah, I'll try it out.

32:21 Everyone's talking about it.

32:22 And it's one of the good things Microsoft has done.

32:24 It's pretty sweet.

32:25 Yeah, that's pretty sweet.

32:26 That's pretty nice.

32:27 And when you do the VS Code stuff, are you just writing straight Python scripts or are

32:31 you doing like VS Code on top of notebooks?

32:34 Yeah.

32:35 You know, it has like that kind of like text view of a notebook type of thing, I think.

32:38 I used to use exclusively Python scripts, so just the .py, started seeing how great

32:43 Jupyter was and so then you start doing everything in Jupyter and then you start to have all

32:47 these like convoluted notebooks and like notebook V1 through 7.

32:51 So then you realize that you've got to like find a balance between like, you know, notebooks

32:55 are great for presentation and for like quickly testing.

32:58 But the sooner you can get it into like a class structure or a package or something.

33:02 The sort of productized version of it, you wanted to get it down into Python code a lot

33:07 of times probably.

33:08 Exactly.

33:09 Like your internal tools you talked about, you're like, all right, this is the library

33:11 you just call stuff, right?

33:13 That just belongs more as a package and not a...

33:15 The sooner you can kind of condense, pull the code out of the notebook and just leave

33:19 notebooks for presentation.

33:21 It's probably the best.

33:22 Yeah.

33:23 It's a lot of pipelines.

33:24 So it's a lot of pre-processing pipelines.

33:26 So you don't want, you know, 50 cells of just moving data.

33:29 Preparing, yeah.

33:30 You talked about having quite a bit of data, some of that being image based.

33:35 Sounds like a lot of work.

33:36 Do you have like a big compute cluster?

33:38 Do you just have like an ultra, an M3 ultra or whatever?

33:42 That's not even out yet, it's M2s.

33:44 But do you just have a big machine or do you guys do cloud stuff?

33:47 What's compute look like?

33:48 It depends on what I'm doing, what I need.

33:50 Most things, let's be honest, I could probably just use my desktop computer is really a super

33:55 micro server, just runs Linux.

33:58 But then for doing deep learning, you need GPUs.

34:01 And so we have a nice set of GPUs we can pull to.

34:04 Do you do your own training on LLMs and other deep learning things?

34:08 The only LLM training I did was just for fun, but yeah, it's all the deep learning stuff

34:13 we have to train and else on, on our particular.

34:15 So that's all GPUs for that.

34:18 A lot of the statistics we do are like permutations.

34:21 So you need to kind of like parallelize them out into CPUs.

34:24 And then I'll pull like a CPU cluster we have if I need it.

34:27 Do you see Irvine have like a big compute resource sort of thing you can grab?

34:31 They have a campus wide one that you can get on and there's a few independent ones that

34:36 I have access to.

34:37 So GPU is kind of in a different place than CPUs are.

34:41 So I can just kind of pick and choose.

34:42 And I have a few servers in the lab that I've just kind of put together that all my camera

34:47 stuff, all my behavior, I kind of wrote it.

34:49 So it's cloud based.

34:51 I can kind of just pull up my phone and look at the videos of what animals are doing and

34:55 stuff.

34:56 All that runs just a server in the lab.

34:58 So yeah, the compute is there when you need it.

35:01 And I think as I've matured, I've kind of learned when to use what compute when and

35:05 when it's worth taking the extra time to use it when you don't need to.

35:10 And also when a lot of the times you don't even need it.

35:12 So I know a lot of people when I see their code and they complain that it takes like

35:16 an hour to run.

35:17 I mean, just using multiprocessing in Python, that in and of itself is enough to not need

35:24 to use a cluster.

35:25 They're just using a single thread for their analysis.

35:28 For sure.

35:29 Or just a bad programming patterns, design patterns, like you're looping over the thing

35:33 in the pandas data frame instead of doing vector operations in the pandas data frame,

35:38 like that kind of stuff.

35:39 Right?

35:40 A hundred percent.

35:41 Yeah.

35:42 I mean, that was one of the first things that I usually teach.

35:44 I have this little example script that shows that like, why is it better to preallocate

35:48 array rather than to just append to the bottom?

35:50 And it's like these things that we kind of take for granted now, but it's not intuitive

35:54 unless you actually see it.

35:55 No, you learn it the hard way, but it sticks in your mind once you learn it.

36:00 I think that's the issue.

36:01 It's like people just take it for granted.

36:03 Yeah, for sure.

36:04 Like, why didn't you know that?

36:05 How do you, how do you onboard new people?

36:07 If you, you know, get new grad students or people contribute, other contributors.

36:11 It just kind of depends on the lab.

36:14 Every lab kind of has their own structure of just kind of this hierarchy of expertise

36:19 where like I started as an undergrad and I just volunteered in a lab at a different university

36:25 and just volunteered my time, eventually could get paid.

36:28 Just wanted to spend time in the lab and you could all the way up to grad students who

36:32 were there to get a PhD and have more kind of autonomy over their projects.

36:35 A postdoc who has a PhD and five to eight years of experience and so can pretty work

36:42 well.

36:43 Then there's like staff scientists or even a lot of labs now are hiring just pure engineers

36:47 or pure software people because there's such a need for that.

36:50 And so yeah, it really just depends on the lab specific situation and what their focus

36:55 is on and what they need.

36:56 Cool.

36:57 I guess if you have a good NSF grant and you got some extra money, it might be money well

37:01 spent to hire some student who has good programming skills, right?

37:04 Absolutely.

37:05 Yeah.

37:06 You talked about the video stuff that you're doing in the streaming video.

37:09 Do you actually do analysis on that or is it just for you to go back and look at?

37:12 Oh yeah.

37:13 Yeah.

37:14 We do analysis on that.

37:15 There's actually a pretty cool deep learning package out now.

37:17 We didn't write it.

37:18 It's something that another lab did where you just give it the video frame and it can

37:21 automatically segment kind of it's an animal.

37:24 So like where their paws are or where their like nose is looking or some cases people

37:29 have like you were talking about eye tracking.

37:31 They do eye tracking in like mice now.

37:32 They do eye tracking on mice?

37:34 That was hard on humans in the nineties.

37:36 Yeah.

37:37 There's a lot of VR.

37:38 So they put kind of mice in this like VR system and they can like see where their little mouse

37:43 pupil is looking on like the VR screen.

37:45 Yeah.

37:46 So you can do some different scenarios and they can detect that and they react to it?

37:49 Oh, absolutely.

37:50 Incredible.

37:51 So yeah, I know a lot of stuff, at least in our field, the Nobel prize was awarded for

37:55 you stick some electrodes in the hippocampus part of your brain.

37:58 That's important for learning and memory.

38:00 And then you have the animal kind of run around some environment and then you take some video

38:05 data of kind of where they were running the environment.

38:07 And if you were only looking at the brain data, you can predict to like 90 some percent

38:13 accuracy the location of the animal.

38:15 So you can show this kind of this correspondence that inside the brain is a map of kind of

38:19 the environment.

38:20 Our stuff, we're taking that a little one step further that says this map is not just

38:25 for space, it's for non-spatial and other things too.

38:28 There's this kind of network of information in the brain that the animal can kind of like

38:32 navigate through, even if they're just standing still, but thinking about some sort of problem.

38:37 But we use video data to validate what the animal's doing or check what kind of tasks

38:41 they're doing.

38:42 So yeah, a lot of multimodal heterogeneous data that each needs its own funky pre-processing

38:48 and depending on the task at hand, you're writing something new to ask that question.

38:53 So is that OpenCV?

38:54 Yeah, my stuff is OpenCV.

38:57 Streamed over sockets and some Django webpage.

39:00 It's fun.

39:01 It's cool to build.

39:03 That sounds really cool.

39:04 Yeah, absolutely.

39:05 So what kind of questions are you answering with the video?

39:06 Yeah.

39:07 Or is it just to correlate back with the time series of what you're measuring in the brain?

39:11 So like I said, the Nobel Prize was for the spatial map.

39:15 We're doing this non-spatial stuff.

39:17 And so we kind of do both in the lab where we have an animal kind of run around and then

39:21 we have an animal just kind of sit still and do some sort of mental task.

39:25 In our case, they have to memorize a sequence of odors.

39:28 And if the sequence gets shuffled, they make a different choice.

39:31 And so we're basically showing how does the brain work for the spatial part versus the

39:36 non-spatial part?

39:37 What's similar about these two things?

39:39 What's different about these things?

39:40 And we show that one of our recent papers was that the brain uses a lot of the similar

39:45 mechanisms to navigate space as it does to navigate this kind of non-spatial odor task.

39:50 But we also showed that there's this mechanism that the brain uses to take kind of discrete

39:55 memories and link them together into some kind of miracle.

39:58 Best case being like this talk, we've been talking back for 30 some minutes now.

40:03 Inside there's kind of chunks of the conversation.

40:05 So if tomorrow someone was to ask you, what did you and that neuroscience guy talk about

40:10 on your podcast?

40:11 You would kind of rattle off this story of, oh, we talked about history of Python and

40:16 this and this and this, right?

40:18 This, this and this are each kind of discrete memories that in your brain you kind of lock

40:22 together and store them so that you could use them, make decisions about them and so

40:28 on.

40:29 Think about it as a whole, not just the little ideas, every little idea, but like just the

40:32 big concept of it, right?

40:34 Exactly.

40:35 Yeah.

40:36 And it's a fundamental thing that the brain does.

40:37 Like people say humans are storytellers and your life is kind of the sequence of events

40:42 of stories.

40:43 And so you use that every single day and across a bunch of diseases, that's one of the first

40:47 things to actually be impaired, whether it's addiction or schizophrenia or Alzheimer's,

40:53 that kind of ability to link things in time and link them well and make decisions about

40:57 them starts to get impaired.

40:59 That's not great when that happens, but that is what happens, right?

41:01 Absolutely.

41:02 And I guess with the software engineering practices, for lack of a better word, that

41:07 you would recommend that maybe other grad students, professors who are feeling like

41:12 they're not, they don't have their software game fully together, pay attention to it.

41:15 And maybe what should they ignore, right?

41:17 Like should they pay attention to like source control and get up?

41:20 Should they have unit tests or, you know, what should they pay attention to or not?

41:23 First off, no one writes tests.

41:24 That is just only the very few, very well put together.

41:28 And usually people who just came from industry write tests.

41:31 Sure.

41:32 That's not an issue.

41:33 But yeah, first off, just learn Python.

41:35 I've said that hundreds of times and I'm preaching to the choir in this audience.

41:39 You are, for sure.

41:41 Learn Python.

41:42 It's honestly, there's not much better you could learn.

41:45 And two, it's, you know, it's quintessential automation stuff.

41:49 So it's really just think about the things that you're doing, the spreadsheets or, you

41:53 know, the simple things and really just ask yourself, if you find yourself doing any repetitive

41:57 tasks, that's a software problem.

41:59 Those are the things to kind of look at first.

42:01 So you have your text editor in one window, Google in the other and stack overflow your

42:06 way to learning.

42:07 And so the way people, I think, really do kind of teach themselves Python is probably

42:10 the best way to learn.

42:12 But nevertheless, I think there is a real need for, and again, we're starting to see

42:17 more of it, just formal education, even if it's just a course.

42:20 Our program is really great that they started to teach a Python course just because the

42:25 students requested it because they knew how important it was.

42:28 We have Python for neuroscience.

42:30 Exactly.

42:31 Okay.

42:32 Yeah.

42:33 I mean, you just work your way through if statements for loops, data types.

42:37 You probably also, I'm just guessing you get a chance to work with some of these external

42:40 libraries that are relevant to studies they're doing, right?

42:44 Rather than here's an example of stock market data.

42:46 You're like, great, not a trader.

42:49 I don't want to be a programmer.

42:50 Why am I here?

42:51 You know?

42:52 Yeah, it's really relevant.

42:53 And I think it's just kind of seeing like, oh yeah, I would do that this way, but this

42:56 is so much easier if I use Python.

42:58 And I can use it in Python and just seeing that, oh, it's not as bad as the mountain

43:03 looks a lot higher when you're at the base than the summit.

43:05 So it does.

43:06 Seeing it done once is usually enough to kind of tell people it's not as bad as you originally

43:11 think.

43:12 Yeah.

43:13 I feel like maybe I saw it this way.

43:14 I certainly know a lot of other people see it this way.

43:16 I mean, when I was younger, but a lot of people see it this way as well, is that like, you

43:19 got to be crazy smart to do programming.

43:22 It's really challenging.

43:24 It's kind of one of those things that only few people can do.

43:27 And then you get into it and you're like, oh, it's, it's not a few really huge steps

43:32 and things you've got to solve.

43:33 It's like a thousand small steps and each one of the little small steps, you're like,

43:37 that was actually easy.

43:38 That's no big deal.

43:39 What's the next step?

43:40 And you get to the end, you're like, where was the big step?

43:42 Where was it really hard?

43:43 Right?

43:44 Yeah, absolutely.

43:45 Yeah.

43:46 Do you have some experience where people in the department or people that worked with

43:49 you are like, ah, I'm not a programmer.

43:51 I don't want to do this stuff.

43:52 Then they kind of got into it and really found out that programming was something they really

43:56 liked.

43:57 Are there any converts out there?

43:58 Yeah, I would say so.

43:59 I mean, I think there's kind of two kinds of people.

44:01 There's people who program just because what is it?

44:04 The programming is an art book or whatever.

44:06 Like, like they, they love it just for the sake of like loving.

44:09 And I'm probably closer to those kinds of people, right?

44:11 Like, I just think it's the coolest thing, like academic, but then there's the people

44:15 who just kind of see it as like, it's a tool like anything else.

44:18 And so like you could be an expert in a drill or you could just know to pick up a drill.

44:23 That's kind of the majority of people is that it's just another tool in their toolkit to,

44:28 especially for a scientist, just to answer the question that you're trying to answer.

44:32 And I would even flip the reverse where there's been some times where I've maybe even used

44:37 Python too much in the sense that like I made a problem more because it's like the automation

44:43 dilemma, right?

44:44 It's like, do I spend an hour automating this when I could do it in 10 minutes?

44:47 You got to do it a lot of times and all of a sudden the hour is worth it.

44:49 But if it turns out you don't.

44:50 It's like, I might need this a year from now, so I might as well just write the script.

44:54 Whereas I could just do it in Excel.

44:56 That's pretty standard, standard problems we all run into in programming.

44:59 It's like, I'll write some code to do that or I could just be done with it for sure.

45:04 All right.

45:05 I think we've got time for a couple more topics.

45:07 One thing that I think might be fun to talk about is publishing papers, right?

45:13 Obviously if you're in academics, especially in labs, you got to publish papers.

45:16 Do you use notebooks and stuff like that for your papers or is that just kind of separate?

45:22 Yeah, like I said, notebooks are great for presentation.

45:25 So yeah, I use notebooks kind of for development just because you can quickly run code and

45:31 go back up and move things around.

45:32 And so I kind of like that ability to kind of just a stream of consciousness, write code

45:37 until you kind of like see how it's kind of working, the prototype, and then refactor

45:40 out into an actual .py document or a package or something.

45:45 So that's kind of been my workflow and it works pretty well.

45:47 But then when you actually have the code and it works and it's robust, you actually want

45:51 to put, there's a lot of figures, that's usually the main thing.

45:53 So you kind of put all of that.

45:55 Here's the data, here's the pre-processing, here's the figure one, here's figure two,

45:59 figure three in the notebook just so it's reproducible and other people can download

46:04 it and rerun your code and that sort of thing.

46:06 So I think that's slowly becoming kind of the standard approach for those labs that

46:11 use Python and they share their code openly.

46:14 That's kind of how they do it.

46:16 Anything like executable books or any of those things that kind of produce printable output

46:21 out of the notebooks, like publishable output out of the notebooks?

46:24 Not a lot, but there's one of the journals, it's called eLife.

46:29 It's kind of, it's trying to like push the boundaries of what it means to publish a scientific

46:33 paper.

46:34 And so they kind of have, because most papers are really just on the web nowadays, the journals

46:39 aren't really physical journals as much anymore.

46:42 They kind of have like papers as executable code where you can like plot the figure in

46:47 the browser and kind of run through the notebook and just as an experiment.

46:51 But it's pretty cool to kind of see these like new alternative ways to still convey

46:56 the same findings, but you can play with it, you can kind of see how some of the methods

47:01 are kind of implicit in the...

47:02 What's the name of the journal?

47:03 That's eLife.

47:04 eLife.

47:05 Okay, cool.

47:06 Yeah, you probably, I suppose, need some way to capture the data output because you might

47:11 not have access to the compute to recompute it.

47:15 Somehow it's got to sort of be a static version, but that sounds really cool.

47:18 Yeah.

47:19 And especially for like most recently, some of our like trained models, it's becoming

47:23 more important to just share the weights and share those sorts of things too.

47:28 You can't just share the code to train the thing if people don't have the compute to

47:31 actually train them themselves.

47:34 It's kind of growing to not just sharing your data, not just sharing your code, but you

47:37 need to share like the key derivatives of the pre-processing and those sorts of things.

47:42 Or even just sharing the version numbers, because there's been psychology or fMRI literature,

47:47 there's like a bug in some version that made a lot of the results null.

47:51 And so, one person could use version 3.7 of a package, but that had a bug, but people

47:56 don't know that.

47:57 So they claim it's not reproducible, but it's really just not the same algorithms.

48:00 Yeah, yeah, yeah.

48:02 Or like across languages, like if you rerun the same analysis in MATLAB versus Python

48:06 versus R, especially complex ones, there's a lot of little design decisions under the

48:11 hood that might tweak exactly how that regression fits or exactly how that, if you're statistically

48:17 sampling how the sampling works under the hood or those sorts of things.

48:20 Awesome.

48:21 Are you familiar with the Journal of Open Source Software?

48:23 Yeah.

48:24 I had the folks on from there.

48:25 Yeah, I had them on quite a while ago and I think they're trying to solve the interesting

48:29 problem of if you take the time to create a really nice package for your area, you might

48:35 have not taken your time writing the paper.

48:37 And so you wouldn't get credit because you don't have as many papers to publish.

48:40 So they let you publish like your open source work there, which I think is pretty cool.

48:44 What do you think about that?

48:45 We kind of have that same problem.

48:47 One of the, I run a nonprofit called Continual AI.

48:51 It does artificial intelligence and outreach and research, and we have conferences and

48:56 all sorts of events.

48:57 But one of the main things we've done is we built a deep learning library on top of PyTorch

49:02 called Avalanche.

49:03 And so we had a really great community of mostly volunteers who just saw the need in

49:08 the field and put it together.

49:10 But then again, it's like a lot of us are academics.

49:12 How do you present this?

49:13 And so you write wrapper papers around kind of the framework.

49:18 So that's kind of been the de facto way of like, it's not really a paper, but you still

49:23 need to share it and get credit for it and put your name on it.

49:28 It's certainly an issue.

49:29 I'm starting to see it not even just with software, but even with hardware, because

49:32 hardware is becoming more open source in our field.

49:34 And so you just kind of write like a paper about the hardware solution to some problem.

49:39 That's cool.

49:40 It's better than a patent.

49:41 Yeah, it's definitely better than a patent.

49:42 Patents, while they serve a purpose, are pretty evil.

49:45 Let's wrap things up with maybe just, you mentioned Continual AI.

49:50 Tell people a bit about that.

49:51 It's the largest nonprofit for continual learning.

49:54 Continual learning in a nutshell is, say I have a neural network and I train my neural

49:59 network to classify cats.

50:01 I classify cats to 90% accuracy and we're like, yeah, this is why neural networks are

50:05 great.

50:06 I take that same trained neural network on cats and I trained it on say dogs.

50:10 It does really well, 90% accuracy on dogs.

50:12 We're really excited why neural networks are so great.

50:14 But the issue is if I take that, again, the same network, I just trained it on dogs and

50:18 previously trained it on cats.

50:20 I try and test it on cats again, it's going to forget pretty much everything it learned

50:25 about cats.

50:26 And this is an old, old problem back in like, you know, the good old days when neural networks

50:30 were connectionist models and it was computer scientists, it was the cognitive scientists.

50:35 They noticed-

50:36 Overtraining or something like that, right?

50:37 Kind of overfitting.

50:38 It's neural related.

50:39 Yeah, it's very similar.

50:40 Catastrophic forgetting, they call it the sequential learning problem, which is why

50:44 I'm really interested in it because I'm really interested in continual learning and sequential

50:48 memory or in neuroscience, it's called the stability plasticity problem.

50:51 So when do you learn?

50:52 When do you remember?

50:53 And so over the last, since we started the organization in 2018, the field has kind of

51:00 exploded just because there's such a need for overcoming this across a lot of use cases.

51:06 So like a lot of times you could only see the data once.

51:09 So the way you solve the problem generally is you just shuffle in cats and dogs into

51:13 the same data set and retrain your model.

51:15 But now the neural networks are getting bigger and bigger and bigger.

51:19 Retraining is getting costlier and costlier.

51:20 You can't just have, can't train a network on petabytes every time you want to update

51:24 it.

51:25 That's even if you have access to the data and the storage to save it and so on and so

51:29 forth.

51:30 So clever ways to solve the problem is, so we're kind of around that.

51:33 We have neuroscientists, we have computer scientists, AI researchers across academia,

51:38 industry, all that are just a bunch of people really interested in this problem to just

51:42 come together and share papers, share ideas.

51:45 We just had a conference, we sponsor a lot of competitions for people to kind of put

51:50 forward an idea to some problem that we kind of put out every year.

51:54 So it's been really, really exciting to kind of see the community grow over the years and

51:58 all the tools and fun things that's kind of come out.

52:00 Well, it's definitely a hot topic right now.

52:02 Absolutely.

52:03 The cognitive scientists and neuroscientists studied neural networks and stuff and it was

52:07 kind of like, well, maybe this model stuff.

52:09 And then now we're in the time of LLMs and the world has gone crazy.

52:13 Absolutely.

52:14 I said we were going to close out with a continual AI, but let me just ask, what are your thoughts

52:18 on LLMs, where this stuff's going?

52:20 I mean, we all have exposure to it in different ways, but you've got this understanding of

52:26 what they're trying to model it on quite a bit.

52:28 So what do you guys in your space think of it?

52:31 So the first lab I joined was, well, the first real lab I joined was a neuroscience lab where

52:36 we were sticking wires in brains and actually doing real neuroscience.

52:40 But I also started kind of simultaneously working with a cognitive scientist where we

52:45 were working on the original word to them.

52:48 So this is in my mind, like the grandfather of the LLM.

52:52 So this is the model that like overnight took Google Translate from being meh to pretty

52:56 good.

52:57 And it's just really at the heart of it, just an auto encoder.

53:00 But we were really excited then doing kind of semantic modeling of how much further kind

53:06 of deep learning could take language modeling.

53:09 And then we were actually using it to study catastrophic forgetting.

53:11 So does word to that catastrophically forget?

53:14 The punchline is it does.

53:15 And that kind of got me jumped into really excited about continual learning and so on.

53:20 So I saw kind of that trajectory then, and then kind of stepped out of it for a few years

53:25 and dug more deep into pure neuroscience and artificial intelligence from other angles.

53:32 But I'd always been just so fascinated by this idea of like, say you take an AI and

53:37 it could read every book or it could read the Internet.

53:39 Like what would you be able to get?

53:42 And that was kind of, you know, in my mind, like, well, seeing word to that, try and do

53:45 the same thing and training it on my laptop, like, well, you know, it's going to take,

53:49 it's going to take a minute till we get there.

53:51 And I underestimated that drastically.

53:55 It was like almost no progress, almost no progress.

53:57 Wow.

53:58 What just happened?

53:59 Right.

54:00 It's a testament to statistical learning more than anything, which is just how much information

54:05 can you just soak up and put together in a fancy new way and regurgitate back.

54:10 I think the next big leap is going to be adding more cognition to that progress.

54:15 So adding, when an agent has a goal, when an agent kind of has to break down a series

54:20 of steps to get to that goal, those kinds of things we don't see as much of.

54:26 And that will kind of be the next big push.

54:28 I think that'll kind of take all the cool things that LLMs can do now and kind of blow

54:33 everyone away of just, if you can take the internet and put it on a seven gigabyte file,

54:38 what can that get you?

54:39 But what if you can take the internet, put it on a seven gigabyte file, but actually

54:42 have some sort of logic and direction and the agent itself can actually navigate through

54:47 its own thoughts.

54:49 That's going to take us, I think, right to the borderline of.

54:52 Terminator?

54:53 Yeah.

54:54 Not Terminator.

54:55 It won't be Terminator.

54:56 No, it won't be Terminator, but it'll be a really an intelligence.

54:59 It's going to be super interesting.

55:00 You know, you've got all this prompt engineering and these clever ways to kind of get the current

55:06 LLMs in the right mindset, which is probably a personification, but you can tell it things

55:12 like here, I want you to tell me how to do this.

55:14 And I'll come up with some answer.

55:15 You can say, I want you to think step by step.

55:18 And then all of a sudden you get a real different type of answer where it pulls out the pieces

55:22 and it thinks about how it does it.

55:23 And it's going to be interesting.

55:24 You know, it's kind of like kind of stuff, but it just, it already knows how to think.

55:29 You don't have to give it little weird clues.

55:31 Like you're an expert in this and you're really good at it.

55:33 Now I want to ask you a question about, Oh, I'm good at it.

55:35 Okay.

55:36 I'll answer better.

55:37 My favorite definition of AI is it's whatever computers can't do yet.

55:40 Yeah.

55:41 Because like, you know, 30 years ago, if we had this conversation, it'd be like, so what

55:44 do you think of deep blue?

55:45 Do you think deep blue, the AI that has to roll the chess can, you know, think, and is

55:51 it going to take all our jobs or it's going to be, what can Watson do?

55:54 Can it think and take all our jobs?

55:56 And you know, that was solved with tree search, which, you know, undergraduates in their second

56:01 year CS class are learning.

56:03 It's just a standard it's search.

56:05 People don't even think of searches.

56:06 What if we just loaded every possible outcome into the chest thing and the steps.

56:11 And we just try to take the traverse each step and see where it takes us.

56:14 Right.

56:15 There's always going to be a room to grow, but you know, from a cognitive science perspective,

56:18 I think something far more interesting rather, you know, I think it's cool to see what computers

56:22 can actually do.

56:23 And I think they can do a hell of a lot more.

56:25 But I think it's more interesting to kind of ask the more philosophical question of

56:28 what does that actually mean for us?

56:30 Because every step of the way, when we develop something new in artificial intelligence,

56:34 it tells us something a lot deeper about our own intelligence too, where back in the fifties

56:39 and sixties, they thought the chess meant intelligence.

56:42 And so now we're kind of seeing with LLMs, if someone just reads a bunch of books and

56:46 can memorize a bunch of books, does that mean they're intelligent?

56:49 Because that's effectively what a, what an LLM can do.

56:51 It comes across as intelligent, right?

56:54 It comes across that way to people like, oh, you have all the answers, but it doesn't mean

56:57 you were good at problem solving necessarily.

56:59 We're kind of peeling at this onion and we're kind of segmenting intelligence into its different

57:04 categories to really kind of break it apart.

57:07 Just from this vague word of intelligence into actually what are the parts that make

57:12 something intelligent?

57:13 What does it really mean?

57:14 And what, what are still the things that like we thought were hard and are really easy or

57:18 the things that we thought were easy and really hard.

57:20 Because any LLM you have now can't chew gum and walk to the store and buy you something

57:26 to eat and then play you in chess.

57:28 And so the general and general AI isn't just there for, to make it sound grander.

57:34 It's there because that's actually something that makes humans very, very unique is that

57:38 I can have this conversation with you, write code, prepare an omelet, walk a dog and do

57:43 all of these things all at once.

57:45 And I started as a baby.

57:47 And have feelings and thoughts and introspection about it and all that.

57:50 - Hopes, dreams, like ketchup and not tomatoes and all sorts of other.

57:54 - Yeah, super interesting.

57:55 So are you, are you pretty positive on where stuff's going?

57:58 - I'm less concerned about the AI itself and I'm more concerned about just how we react

58:02 to it.

58:03 And if we act, react intelligent, we react well.

58:06 We went through the industrial revolution.

58:08 We've went through the steel age.

58:10 We're starting to go through the cognitive revolution or whatever people are going to

58:14 call this a hundred years ago.

58:15 I think by and large, people are going to be okay.

58:18 I think we just need to have good policies, make sure that people do it responsibly, people

58:22 do it well.

58:23 And that's going to be the hard part is just how do we manage the transition?

58:26 Well, how do we take a whole labor force whose jobs will be gone?

58:31 Like there's no economic incentive to keep their jobs and retrain them in a really smart

58:36 and sensible way.

58:38 So people's livelihoods don't just go away to maximize the dollar.

58:42 Those are kind of problems that are going to need good policy and clever solutions and

58:47 a lot of research to actually sit down and handle well.

58:50 But AI isn't going to go anywhere just like computers haven't went anywhere, but it's

58:55 not terminated that we should worry about.

58:57 - No, absolutely.

58:58 It's certainly not.

58:59 I was joking.

59:00 We should worry about disinformation and politics and news and all that stuff.

59:05 - I think people get so hung up on all the negatives that we kind of forget, like we're

59:08 developing these tools for a reason.

59:11 Scientists are spending so much time and we're excited about them for a reason.

59:13 One being that I used to work on proteins, trying to find the structure of proteins.

59:18 And that was one protein a year.

59:20 It would take one year to find the structure of protein.

59:22 And now I can run this.

59:23 I took the same sequence that we found and ran it through AlphaFold, which is DeepMind's

59:27 protein folding software.

59:29 They could do it in five minutes.

59:31 I mean, those are the things that are really like catalyze science and technology and healthcare

59:36 and actually solve the problems we want to solve.

59:39 So that's what I think we should do.

59:40 - I think I heard even some AI stuff coming up with some new battery chemistry, potentially.

59:44 - Everything.

59:45 How do you appropriately kind of harness like a fusion ring and like a tokamak reactor or

59:52 something like that.

59:54 Everything even to, how do you route a PCB, like millions of traces effect.

01:00:01 So it's a tool for the mind and it's not going to put our minds out of work.

01:00:05 Like you said a long time ago when you were in your eye track.

01:00:07 - Yeah, exactly.

01:00:08 It's like a different view of that thing.

01:00:09 It's like the same reason that none of us want to sit around and churn butter and go

01:00:13 till the fields and walk to work.

01:00:17 - I don't dislike my clothes washer.

01:00:20 Not at all.

01:00:21 Yeah.

01:00:22 Awesome.

01:00:23 All right.

01:00:24 Well, thanks for giving us a look inside your lab, inside your research and the field, and

01:00:27 then just, you know, share what you've gotten up to.

01:00:29 It's been great.

01:00:30 - Yeah, yeah, yeah.

01:00:31 Thanks for having me.

01:00:32 Yeah.

01:00:33 It's been really great to, at least for the years that I've been in research, seeing how

01:00:37 pivotal Python has been.

01:00:38 And I think that's one of the big things that probably goes unnoticed by a lot of developers

01:00:43 when they're actually writing their code to solve whatever problem they're solving is

01:00:46 that someone who's wrote the NumPy library or the scikit-learn library, or, you know,

01:00:52 Jupyter notebooks has actively played a part in curing diseases and making people's lives

01:00:56 better.

01:00:57 And those stories usually don't kind of come to the forefront.

01:01:00 - It's not a direct line, right?

01:01:02 But the research was done and facilitated by these open source tools and then discoveries

01:01:06 were made.

01:01:07 And that's the difference between people getting cured in 10 years or getting cured in a year.

01:01:11 And that's thousands of lives.

01:01:13 And so you might not think about it when you're sitting behind your desk, you know, making

01:01:17 something usable.

01:01:18 And, but for every day, a PhD student doesn't have to like pull their hair out and debug

01:01:22 some software when it's well-written and it works well.

01:01:25 That's a day that they can find something new.

01:01:27 - That's awesome.

01:01:28 Yeah.

01:01:29 Very inspiring.

01:01:30 Let's leave it there, Keelen.

01:01:31 Thank you for being on the show.

01:01:32 - Absolutely.

01:01:33 - Yeah.

01:01:34 It's been great to talk to you.

01:01:35 - Thank you so much.

01:01:36 - Yeah.

01:01:37 - And thanks for joining Talk Python To Me.

01:01:39 Thank you to our sponsors.

01:01:40 Be sure to check out what they're offering.

01:01:41 It really helps support the show.

01:01:44 It's time to stop asking relational databases to do more than they were made for and simplify

01:01:49 complex data models with graphs.

01:01:52 Check out the sample FastAPI project and see what Neo4j, a native graph database can

01:01:57 do for you.

01:01:58 Find out more at talkpython.fm/neo4j.

01:02:03 This episode is sponsored by Posit Connect from the makers of Shiny.

01:02:07 Publish, share, and deploy all of your data projects that you're creating using Python.

01:02:12 Streamlet, Dash, Shiny, Bokeh, FastAPI, Flask, Quarto, Reports, Dashboards, and APIs.

01:02:19 Posit Connect supports all of them.

01:02:20 Try Posit Connect for free by going to talkpython.fm/posit.

01:02:23 P-O-S-I-T.

01:02:24 Want to level up your Python?

01:02:28 We have one of the largest catalogs of Python video courses over at Talk Python.

01:02:32 Our content ranges from true beginners to deeply advanced topics like memory and async.

01:02:37 And best of all, there's not a subscription in sight.

01:02:40 Check it out for yourself at training.talkpython.fm.

01:02:43 Be sure to subscribe to the show, open your favorite podcast app, and search for Python.

01:02:48 We should be right at the top.

01:02:49 You can also find the iTunes feed at /iTunes, the Google Play feed at /play, and the Direct

01:02:55 RSS feed at /rss on talkpython.fm.

01:02:59 We're live streaming most of our recordings these days.

01:03:01 If you want to be part of the show and have your comments featured on the air, be sure

01:03:05 to subscribe to our YouTube channel at talkpython.fm/youtube.

01:03:10 This is your host, Michael Kennedy.

01:03:11 Thanks so much for listening.

01:03:12 I really appreciate it.

01:03:14 Now get out there and write some Python code.

01:03:16 [MUSIC PLAYING]

01:03:19 [MUSIC ENDS]

01:03:22 [MUSIC PLAYING]

01:03:25 [MUSIC ENDS]

01:03:28 [MUSIC PLAYING]

01:03:31 [MUSIC ENDS]

01:03:34 ♪♪

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon