WEBVTT

00:00:00.001 --> 00:00:03.320
The brain is truly one of the final frontiers of human exploration.

00:00:03.320 --> 00:00:08.420
Understanding how the brain works has vast consequences for human health and for computation.

00:00:08.420 --> 00:00:13.280
Imagine how computers might change if we actually understood thinking and even consciousness.

00:00:13.280 --> 00:00:16.420
On this episode, you'll meet Justin Kiggins and Corinne Titor,

00:00:16.420 --> 00:00:21.620
who are research scientists using Python for their daily work at the Paul Allen Brain Institute.

00:00:21.620 --> 00:00:24.940
They're joined by Nicholas Kane, who's a software developer there,

00:00:24.940 --> 00:00:28.040
supporting scientists using Python as well.

00:00:28.760 --> 00:00:31.680
Now, even if you aren't interested in brain science directly,

00:00:31.680 --> 00:00:34.260
I really encourage you to listen to this entire interview.

00:00:34.260 --> 00:00:36.040
It's super fascinating.

00:00:36.040 --> 00:00:41.560
This is Talk Python To Me, episode 164, recorded May 4th, 2018.

00:00:54.940 --> 00:01:01.880
Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

00:01:01.880 --> 00:01:04.000
This is your host, Michael Kennedy.

00:01:04.000 --> 00:01:06.000
Follow me on Twitter, where I'm @mkennedy.

00:01:06.000 --> 00:01:09.880
Keep up with the show and listen to past episodes at talkpython.fm.

00:01:09.980 --> 00:01:12.480
And follow the show on Twitter via at talkpython.

00:01:12.480 --> 00:01:17.020
This episode is brought to you by Cox Automotive and Rollbar.

00:01:17.020 --> 00:01:17.860
That's right.

00:01:17.860 --> 00:01:20.340
Cox Automotive has joined the show as a sponsor.

00:01:20.340 --> 00:01:21.880
They're looking for new developers.

00:01:21.880 --> 00:01:25.540
So check out what they're offering during their segment or the link in the show notes.

00:01:27.020 --> 00:01:29.040
Justin, Corinne, Nick, welcome to the show.

00:01:29.040 --> 00:01:30.320
Yeah, thank you for having us.

00:01:30.320 --> 00:01:31.240
Yeah, hello, hello.

00:01:31.240 --> 00:01:34.040
It's super exciting to have you here on the podcast.

00:01:34.040 --> 00:01:42.340
I'm very, very interested in learning about how you're applying Python and data science type things to brain science.

00:01:42.340 --> 00:01:43.860
It's going to be really, really fun.

00:01:43.860 --> 00:01:44.360
Great.

00:01:44.620 --> 00:01:45.200
Yeah, for sure.

00:01:45.200 --> 00:01:48.140
But before we get into all the details, let's start with your story.

00:01:48.140 --> 00:01:49.840
I guess, Justin, go first.

00:01:49.840 --> 00:01:51.320
How did you get into programming in Python?

00:01:51.320 --> 00:01:58.820
Yeah, so I think I started programming mostly in kind of college, working in research labs, you know, part of engineering classes.

00:01:58.820 --> 00:02:02.380
And that was largely kind of MATLAB and LabVIEW.

00:02:02.380 --> 00:02:08.340
MATLAB is kind of the dominant language in most neuroscience research environments.

00:02:08.340 --> 00:02:09.460
And what was your degree?

00:02:09.460 --> 00:02:10.540
What were you studying at the time?

00:02:10.540 --> 00:02:13.460
I was studying bioengineering, biomedical engineering.

00:02:13.640 --> 00:02:16.440
And then I went and started a PhD in neuroscience.

00:02:16.440 --> 00:02:27.740
And it was during my PhD that I decided that there was this old C code, like raw C, that my advisor had written for some of our experiments.

00:02:27.740 --> 00:02:35.040
And I was chasing pointers and trying to figure out how to do memory buffers with audio.

00:02:35.040 --> 00:02:36.440
And I was like, this is brutal.

00:02:36.440 --> 00:02:37.580
I don't want to do this.

00:02:37.580 --> 00:02:42.760
And I basically cold turkey switched everything that I was doing over to Python.

00:02:42.760 --> 00:02:58.640
So rewrote a bunch of that code, taught myself Python by kind of rewriting that, implementing it in Python, starting to use some of the scientific Python stuff for my analysis, building out a Django database to maintain my, to keep track of my research that I was working on.

00:02:58.980 --> 00:03:01.900
It was kind of a cold turkey switch for me about 2012.

00:03:01.900 --> 00:03:04.000
While I was working on my PhD.

00:03:04.000 --> 00:03:05.060
And it was a good switch.

00:03:05.060 --> 00:03:05.640
You're happy with it?

00:03:05.640 --> 00:03:06.100
Yeah.

00:03:06.100 --> 00:03:06.480
Yeah.

00:03:06.480 --> 00:03:08.680
I mean, I think that it's done me well.

00:03:08.680 --> 00:03:12.280
And the rest of the field, I think, is starting to catch up.

00:03:12.320 --> 00:03:15.000
And it's only become more powerful since then.

00:03:15.000 --> 00:03:15.640
Yeah.

00:03:15.640 --> 00:03:19.280
If you look at the popularity of Python, it's been going upward.

00:03:19.280 --> 00:03:25.780
But there was a major inflection point where it became the rate of popularity growth increased around 2012.

00:03:25.780 --> 00:03:31.300
And I think there's just a lot of that is due to the data science tool improvements, that whole space.

00:03:31.300 --> 00:03:32.140
Yeah.

00:03:32.140 --> 00:03:32.580
Absolutely.

00:03:32.760 --> 00:03:36.840
And I think that I really just caught the edge of that wave.

00:03:36.840 --> 00:03:37.460
Yeah.

00:03:37.460 --> 00:03:38.860
You're part of that wave, for sure.

00:03:38.860 --> 00:03:39.340
For sure.

00:03:39.340 --> 00:03:40.240
Nice.

00:03:40.240 --> 00:03:41.380
And Corinne, how about yourself?

00:03:41.380 --> 00:03:41.720
Yeah.

00:03:41.720 --> 00:03:45.860
So I started coding after my undergraduate degree.

00:03:45.860 --> 00:03:49.320
I had an undergraduate degree in physics and psychology.

00:03:49.320 --> 00:03:53.520
And afterwards, went to Los Alamos National Lab.

00:03:53.520 --> 00:03:59.860
And so my first coding language there, I was doing very physics-y, dominated stuff.

00:03:59.860 --> 00:04:01.180
So it was Fortran, actually.

00:04:02.260 --> 00:04:06.180
And then after that, I went back to grad school in computational neuroscience.

00:04:06.180 --> 00:04:11.400
And there, the main coding language we used was MATLAB, as Justin mentioned.

00:04:11.400 --> 00:04:18.960
And then after that, I had a couple positions at, for example, Qualcomm and Sandia National Labs.

00:04:18.960 --> 00:04:22.100
And there, I was still using mostly MATLAB.

00:04:22.100 --> 00:04:24.280
So we'd have to buy licenses.

00:04:24.280 --> 00:04:26.220
And then I came to the Allen Institute.

00:04:27.120 --> 00:04:33.220
And here, Nick and I were both here during the very beginning of kind of our latest 10-year plan.

00:04:33.220 --> 00:04:41.360
And we wanted to make sure that everything we use, like one of the goals of the Allen Institute is to be able to make standardized data that the community can use.

00:04:41.360 --> 00:04:45.100
And part of that is wanting it to be open source.

00:04:45.640 --> 00:04:48.340
So a lot of us on the ground were thinking about this.

00:04:48.340 --> 00:04:54.080
And we had the option at the time to use whatever coding language we wanted to for the projects that we were pursuing.

00:04:54.080 --> 00:04:56.380
But we really all got together and was like, you know what?

00:04:56.380 --> 00:04:58.180
We want everyone to be able to look at our code.

00:04:58.180 --> 00:05:00.460
We want each other to look at our code.

00:05:00.460 --> 00:05:02.000
And we're going to go with Python.

00:05:02.300 --> 00:05:04.760
So I learned Python on the fly when I came to the Allen.

00:05:04.760 --> 00:05:08.120
What was the transition like coming from, say, MATLAB to Python?

00:05:08.120 --> 00:05:09.700
It was a learning curve.

00:05:09.700 --> 00:05:13.580
I'd like to say I was kind of floundering around for probably about three months.

00:05:13.580 --> 00:05:17.740
And there was a lot of, like, indexing in MATLAB.

00:05:17.740 --> 00:05:18.780
And Python is different.

00:05:18.780 --> 00:05:22.260
And we do a lot of time series analysis data.

00:05:22.260 --> 00:05:26.000
So just the indexing and things like that was a transition.

00:05:26.000 --> 00:05:30.800
But, you know, at the end of the day, I'm very glad that we chose to do that.

00:05:30.900 --> 00:05:31.200
That's cool.

00:05:31.200 --> 00:05:34.720
Yeah, it's different, but it's not that different, right?

00:05:34.720 --> 00:05:38.020
It does still have a similar feel, at least, I think.

00:05:38.020 --> 00:05:40.280
So we could have gone to, like, different.

00:05:40.280 --> 00:05:43.240
I mean, we could have gone to C or some other language like that.

00:05:43.240 --> 00:05:46.740
But really, it was a great transition for people on the outside.

00:05:46.740 --> 00:05:52.840
We knew a lot of them would be very MATLAB savvy since that was kind of the main code at the time.

00:05:52.840 --> 00:05:56.660
I think people are transitioning now, but it's still a high-level code.

00:05:56.660 --> 00:05:57.040
Right.

00:05:57.220 --> 00:06:04.160
And I think the ecosystem for Python aligns very well with your mission of trying to have everything open source and stuff, right?

00:06:04.160 --> 00:06:11.800
It's of all the different languages, Python embraces the sort of zen of open source more than average, I would say.

00:06:11.800 --> 00:06:13.320
Nick, how about yourself?

00:06:13.320 --> 00:06:18.940
Yeah, so just like Corinne and Justin, I started programming in MATLAB as an undergraduate.

00:06:18.940 --> 00:06:29.880
When I went to graduate school at the University of Washington for my Ph.D., I was in an applied math department, and my advisor encouraged me to learn Fortran.

00:06:30.320 --> 00:06:38.920
So I wrote my first project in Fortran, and just like Justin was saying, I was chasing all sorts of things that I didn't really understand and having a difficult time,

00:06:38.920 --> 00:06:44.040
and decided I would try out this language that some of my colleagues were telling me about,

00:06:44.040 --> 00:06:50.180
and started rewriting all my algorithms in Python and using that as sort of my learning case.

00:06:50.840 --> 00:06:56.900
And then as I got more into computational neuroscientists, or as I got into computational neuroscience,

00:06:56.900 --> 00:07:03.060
there's actually a lot of packages that are written in low-level languages, packages like Nest or Neuron,

00:07:03.060 --> 00:07:06.320
that have developed really good Python bindings.

00:07:06.520 --> 00:07:15.180
So I realized that I didn't have to sacrifice efficiency or engagement with these other theoretical and computational neuroscience communities,

00:07:15.180 --> 00:07:19.420
but I could still program in a language with a ton of flexibility and a ton of tools.

00:07:19.420 --> 00:07:22.500
So it was a really natural transition over.

00:07:22.500 --> 00:07:28.220
So then when I came to the Allen Institute, brought that knowledge in, and to be honest, really haven't looked back.

00:07:28.220 --> 00:07:30.360
I've been using Python for most of my day-to-day work.

00:07:30.360 --> 00:07:32.040
That's really a cool story.

00:07:32.040 --> 00:07:36.600
And because of the bindings, right, because there's underlying libraries,

00:07:36.600 --> 00:07:39.560
people can still use those libraries.

00:07:39.560 --> 00:07:43.260
It's just you happen to be able to program in a higher-level language,

00:07:43.260 --> 00:07:46.860
and if they want to go write in C or Fortran, that's all well and good for them, right?

00:07:46.860 --> 00:07:52.700
Yeah, and you can use the expertise of really core developers working on highly technical material

00:07:52.700 --> 00:07:55.800
in really efficient multiprocessing libraries,

00:07:55.800 --> 00:08:04.620
but then be able to define, at a high level, define simulations and define models in a much user-friendly syntax,

00:08:04.620 --> 00:08:08.060
but really not sacrifice efficiency.

00:08:08.060 --> 00:08:09.520
Yeah, that's really, really cool.

00:08:09.520 --> 00:08:14.360
So I kind of want to go through the projects that you're each working on

00:08:14.360 --> 00:08:20.300
and give people a sense of what is it you do day-to-day at the Allen Institute

00:08:20.300 --> 00:08:24.980
because it's not like, well, I work at this e-commerce site, or I work at a bank,

00:08:24.980 --> 00:08:26.200
and we all know what that looks like.

00:08:26.200 --> 00:08:30.040
But you work at a pretty special place, so I'm going to keep the same order, I guess.

00:08:30.040 --> 00:08:32.520
Justin, what kind of stuff do you do day-to-day?

00:08:32.520 --> 00:08:35.720
I'm a scientist in the visual behavior team.

00:08:35.720 --> 00:08:44.120
So in general, I'm in a chunk of the institute that is very interested in neural coding.

00:08:44.120 --> 00:08:46.640
So we can think a little bit about sometimes,

00:08:46.640 --> 00:08:51.100
one way of thinking about what the brain does and what neurons in the brain do

00:08:51.100 --> 00:08:56.900
is that they have some representation, some way of encoding what is out in the environment.

00:08:56.900 --> 00:09:00.540
So if I'm looking at something,

00:09:01.060 --> 00:09:05.920
there's a particular pattern of activity that that's going to elicit in the cells in my brain.

00:09:05.920 --> 00:09:10.060
And so in general, this is understanding how this happens

00:09:10.060 --> 00:09:15.260
and how these types of representations emerge, what they are,

00:09:15.260 --> 00:09:20.840
and then how other parts of the brain use those representations to make decisions,

00:09:20.840 --> 00:09:24.900
to do whatever the other parts of the brain need to do with those kind of intermediates.

00:09:24.900 --> 00:09:28.080
In general, that's the kind of stuff that we do.

00:09:28.700 --> 00:09:32.720
So we have a large experimental pipeline.

00:09:32.720 --> 00:09:35.740
One of the interesting things about the Allen Institute is we kind of take an industrial

00:09:35.740 --> 00:09:39.420
approach to generating data for these types of experiments.

00:09:39.420 --> 00:09:45.400
We have these very large pipelines that generate very large data sets on standard experimental rigs.

00:09:45.400 --> 00:09:50.040
So are these experiments, are this like you're bringing folks in and you have them,

00:09:50.040 --> 00:09:51.920
do you hook them up with an EEG?

00:09:51.920 --> 00:09:52.660
Yeah, no.

00:09:52.660 --> 00:09:56.760
So most of the work that we do here in my group is dealing with mice.

00:09:57.140 --> 00:10:06.040
So we can actually present images on the screen for the mouse and then record individual neurons in the mouse's brain.

00:10:06.040 --> 00:10:10.940
And it's very hard to do this in humans, but it's a little bit easier to do it in mice.

00:10:10.940 --> 00:10:11.400
That's wild.

00:10:11.400 --> 00:10:13.320
How do you get them to pay attention to the screen?

00:10:13.580 --> 00:10:18.140
This is part of what my project deals with.

00:10:18.140 --> 00:10:20.900
So there's an experimental setup.

00:10:20.900 --> 00:10:28.080
So at the end of the day, we basically need to fit a very large microscope over them in order to record from the individual neurons.

00:10:28.080 --> 00:10:34.060
And a very small glass window is implanted in order to be able to see into their brain.

00:10:34.460 --> 00:10:42.520
They are basically trained to be comfortable with getting their head attached up against the microscope.

00:10:42.520 --> 00:10:45.020
And they've got a little running wheel they can run on.

00:10:45.020 --> 00:10:47.860
And then we have the screen next to them that's presenting images.

00:10:47.860 --> 00:10:48.280
I see.

00:10:48.280 --> 00:10:50.620
So they're kind of fixed, like looking straight at it.

00:10:50.620 --> 00:10:54.280
They're kind of stuck, but they've got a wheel in front of them.

00:10:54.700 --> 00:10:58.400
So we are controlling the visual environment, but they're kind of free to move otherwise.

00:10:58.400 --> 00:11:01.660
So it's almost like a little virtual reality type.

00:11:01.660 --> 00:11:02.900
Yeah, yeah, yeah.

00:11:02.900 --> 00:11:07.120
We just need a miniature Oculus Rift type thing you can put on them.

00:11:07.120 --> 00:11:07.360
Yeah, exactly.

00:11:07.360 --> 00:11:08.840
I mean, that's basically...

00:11:08.840 --> 00:11:14.320
So then, I mean, that's an interesting segue because then your question, you know, how do we actually get them to pay attention,

00:11:14.320 --> 00:11:17.780
is that we put a lick spout in front of them.

00:11:17.780 --> 00:11:21.480
And they can lick the lick spout.

00:11:21.480 --> 00:11:25.780
And if they lick at the right times, then we make sure that they get a little bit of water.

00:11:25.780 --> 00:11:33.560
And so they basically, through trial and error, start realizing what we're trying to get them to pay attention to on the screen.

00:11:33.560 --> 00:11:35.840
They're basically in a little video game.

00:11:35.840 --> 00:11:40.100
I mean, they basically are, you know, we're controlling what's on the screen,

00:11:40.100 --> 00:11:43.800
and they have to lick when the game rewards them for licking.

00:11:43.800 --> 00:11:45.220
Yeah, that's really wild.

00:11:45.220 --> 00:11:46.400
That's quite interesting.

00:11:46.400 --> 00:11:51.260
So then you capture all this data and sort of analyze it afterwards, huh?

00:11:51.400 --> 00:11:51.760
Yeah, yeah.

00:11:51.760 --> 00:11:52.940
So we generate data.

00:11:52.940 --> 00:11:54.880
I mean, it's a little bit of...

00:11:54.880 --> 00:11:55.980
So we've got some...

00:11:55.980 --> 00:12:03.860
So some of the data gets streamed and analyzed in real time to give the trainers feedback on what the mice are doing and their well-being.

00:12:03.860 --> 00:12:05.700
We have a...

00:12:05.700 --> 00:12:11.980
In order to train them and to do this at scale, we have to standardize these training procedures.

00:12:11.980 --> 00:12:13.860
The game has to go from easy to hard.

00:12:13.960 --> 00:12:23.720
So we have an entire system that Nick and I actually have coordinated on where at the end of each training session, the data gets uploaded.

00:12:23.720 --> 00:12:31.700
Some automatic analysis happens that determines what the next stage is that the mouse is going to have the next time they come in for training.

00:12:32.240 --> 00:12:43.180
And so that requires pushing data back and forth between servers, sending it off to a microservice that Nick is running, and then the next day the mouse is on that next stage.

00:12:43.180 --> 00:12:45.360
So we train them up.

00:12:45.720 --> 00:12:49.420
Then when they're ready, then we put them under the microscope.

00:12:49.420 --> 00:12:52.120
And so they're in a similar situation.

00:12:52.120 --> 00:12:58.040
But now we've got a microscope that is recording the activity of individual neurons in their brains.

00:12:58.040 --> 00:13:00.980
This gets acquired over a few days.

00:13:00.980 --> 00:13:02.360
All that data...

00:13:02.360 --> 00:13:08.640
I mean, we're talking very large data files that are literally movies of neurons in their brain.

00:13:08.640 --> 00:13:09.800
That all gets pushed up.

00:13:09.800 --> 00:13:11.060
That was going to be one of my questions.

00:13:11.060 --> 00:13:12.680
Like, how big are one of these files?

00:13:12.680 --> 00:13:14.040
Like, how much data are we talking about?

00:13:14.040 --> 00:13:14.440
Yeah.

00:13:14.440 --> 00:13:15.800
How big is one of these files, Nick?

00:13:15.800 --> 00:13:16.340
Do you even know?

00:13:16.340 --> 00:13:17.280
Terra scale.

00:13:17.280 --> 00:13:21.940
I think it's less than a terabyte, but it's, you know, many, many hundreds of gigs.

00:13:21.940 --> 00:13:22.640
Wow.

00:13:22.640 --> 00:13:23.080
Okay.

00:13:23.080 --> 00:13:31.060
So this gets pushed up to the server, and then we've got a whole other team that's developed algorithms for basically extracting these signals out.

00:13:31.220 --> 00:13:39.580
So you've got a bunch of kind of ML that has to happen in order to basically do segmentation, a bunch of image recognition stuff.

00:13:39.580 --> 00:13:41.680
You know, where are the cells in this movie?

00:13:41.680 --> 00:13:44.400
And then extracting the activity of those cells.

00:13:44.400 --> 00:13:52.980
And then basically kind of at the end of a bunch of that pipeline, of that kind of processing ML pipeline, a bunch of this data then comes back to me,

00:13:52.980 --> 00:13:58.880
where now I have signals and, you know, I know I have the record of the images that were presented on the screen.

00:13:59.300 --> 00:14:02.700
I've got other data about when the mouse licked, when it didn't.

00:14:02.700 --> 00:14:07.860
And so basically then I take this data and try to make sense of it.

00:14:07.980 --> 00:14:16.980
So to what extent can I, you know, if I'm just looking at the activity, can I decode what was on the screen from that activity?

00:14:16.980 --> 00:14:29.380
If I'm looking at the activity, can I predict what the mouse's choices were at any given time, whether it chose to lick or whether it chose not to lick in the context of its performance on the game?

00:14:29.940 --> 00:14:31.140
That's just fascinating.

00:14:31.140 --> 00:14:32.360
Yeah, this is really wild.

00:14:32.360 --> 00:14:35.540
I had no idea that you could do these kinds of things.

00:14:35.540 --> 00:14:42.820
Yeah, and at the end of the day, I mean, basically to do this, it's all largely like scikit-learn and pandas, right?

00:14:42.900 --> 00:14:52.740
Like reducing this stuff into a feature matrix where your feature vectors or the identity is the activity of any given cell.

00:14:52.740 --> 00:15:00.080
So if I've got 100 cells that we recorded from, each cell becomes one dimension in my vector,

00:15:00.600 --> 00:15:05.880
and I've got a bunch of categorical or continuous information about what was on the screen,

00:15:05.880 --> 00:15:10.560
and then it's just a regression or a classification problem at that point.

00:15:10.560 --> 00:15:10.820
Yeah.

00:15:10.820 --> 00:15:14.620
And this basically is what lets us kind of, you know, by approaching it in this way,

00:15:14.620 --> 00:15:20.920
we can build the inferences and say, well, you know, this area over here did really good at decoding images.

00:15:20.920 --> 00:15:22.720
This area over here didn't.

00:15:22.720 --> 00:15:27.180
But that area was very good at predicting what the mouse's decision was.

00:15:27.580 --> 00:15:32.080
So we can kind of start to build out inferences about kind of what different parts of the brain are doing

00:15:32.080 --> 00:15:34.800
and how they're doing that through this type of approach.

00:15:34.800 --> 00:15:40.780
This portion of Talk Python To Me is brought to you by Cox Automotive.

00:15:40.780 --> 00:15:44.360
They're leading the way in cutting-edge, industry-changing technology

00:15:44.360 --> 00:15:47.620
that is transforming the way the world buys, sells, and owns cars.

00:15:47.620 --> 00:15:52.240
And they're looking for software engineers and technical leaders to help them do just that.

00:15:52.240 --> 00:15:54.060
Do you hate being stuck in one tech stack?

00:15:54.060 --> 00:15:55.900
Well, that's not a problem at Cox Automotive.

00:15:56.200 --> 00:15:59.140
Their developers work across multiple tech stacks and platforms.

00:15:59.140 --> 00:16:01.880
They give you the room you need to grow your career.

00:16:01.880 --> 00:16:05.520
Bring your technical skills and coding know-how to Cox Automotive.

00:16:05.520 --> 00:16:11.480
You'll create real-world solutions to today's business problems alongside some of the best and brightest minds.

00:16:11.480 --> 00:16:15.680
Are you ready to challenge today and transform tomorrow with Cox Automotive?

00:16:15.680 --> 00:16:19.760
Go to talkpython.fm/cox, C-O-X,

00:16:19.760 --> 00:16:23.060
and check out all the exciting positions they have open right now.

00:16:25.060 --> 00:16:26.120
A couple of thoughts.

00:16:26.120 --> 00:16:33.000
One, who would have thought that a library coming out of the financial industry, pandas,

00:16:33.000 --> 00:16:35.140
would be helping us understand the brain?

00:16:35.140 --> 00:16:37.600
And also, who knew mice could generate so much data?

00:16:37.600 --> 00:16:38.580
Yeah.

00:16:39.580 --> 00:16:43.460
Well, I mean, the amount of data that we can generate, I mean, I think it's probably obvious.

00:16:43.460 --> 00:16:47.740
And we're not even getting all of the data that we could generate out of these guys.

00:16:47.740 --> 00:16:53.580
We have, this is, I mean, and this data that we're talking about, I mean, we're literally talking,

00:16:53.580 --> 00:17:04.240
zooming in on an area of the mouse's brain that is maybe a few, what, like hundreds of microns, micrometers wide,

00:17:04.240 --> 00:17:07.460
and maybe like, and a really thin, thin piece.

00:17:07.460 --> 00:17:12.940
So we're talking, I mean, we're talking about a couple, you know, many dozens to hundreds of neurons

00:17:12.940 --> 00:17:17.120
out of the thousands and thousands of neurons in the mouse's brain, right?

00:17:17.120 --> 00:17:23.460
We're not, I mean, this is just the tip of the iceberg in what we could be potentially recording as technologies improve.

00:17:23.460 --> 00:17:23.880
Yeah.

00:17:23.880 --> 00:17:25.600
And someday we probably will be, right?

00:17:25.700 --> 00:17:25.880
Yeah.

00:17:25.880 --> 00:17:27.180
I mean, there's tons of initiatives.

00:17:27.180 --> 00:17:30.600
I mean, the Allen Institute is leading on a bunch of efforts.

00:17:30.600 --> 00:17:35.180
The recording modality I just described to you is that what we have currently released

00:17:35.180 --> 00:17:43.140
and the kind of stuff that is currently on our website, not with all the behavior, but that recording modality.

00:17:43.900 --> 00:17:56.420
And there's a new effort of Neuropixels probes that the Allen Institute has been involved in that will get us kind of up into the thousands range of simultaneously recorded cells.

00:17:56.420 --> 00:18:05.040
And there are even more forward-thinking efforts to be much more comprehensive in what we can record from with this level of detail.

00:18:05.040 --> 00:18:05.400
Yeah.

00:18:05.400 --> 00:18:06.200
It's amazing.

00:18:06.200 --> 00:18:06.620
All right.

00:18:06.620 --> 00:18:08.220
Corinne, how about yourself?

00:18:08.480 --> 00:18:08.640
Yeah.

00:18:08.640 --> 00:18:14.820
So Justin works on kind of a higher-level project where, you know, you have actual behaving mice.

00:18:14.820 --> 00:18:17.840
I'd say I work at one scale downwards.

00:18:17.840 --> 00:18:26.600
So a large part of this institute is trying to really define what the components are in the brain.

00:18:26.600 --> 00:18:33.400
So you have a bunch of neurons, and theoretically those are differentiated into different cell types.

00:18:33.940 --> 00:18:41.900
So researchers in the field have been trying to figure out what sorts of types of neurons there are in the brain for probably 100 years.

00:18:41.900 --> 00:18:46.080
And this is something that people haven't really solidified their ideas on.

00:18:46.080 --> 00:18:47.660
Is it still an open question?

00:18:47.660 --> 00:18:49.980
Like people don't know all the types?

00:18:49.980 --> 00:18:50.420
Yeah.

00:18:50.420 --> 00:18:51.400
It's an open question.

00:18:51.400 --> 00:18:59.300
So we're really devoting a lot of resources to try to get to some sort of ground truth.

00:18:59.300 --> 00:19:02.480
It's not clear that there's going to be specific types of neurons.

00:19:02.620 --> 00:19:05.360
There's probably a continuum, but how well can we define it?

00:19:05.360 --> 00:19:11.240
And then after we've had some definitions, can we figure out what those different types are doing,

00:19:11.240 --> 00:19:13.260
what function they're performing in the brain?

00:19:13.260 --> 00:19:17.680
So Justin didn't mention that.

00:19:17.680 --> 00:19:19.080
There's the reason we use mice.

00:19:19.080 --> 00:19:22.260
And the reason we use mice is that we have a lot of genetic controls.

00:19:22.780 --> 00:19:34.540
So we specifically breed different types of mice to fluoresce looking under a microscope for different types of genes that are expressed in the neurons.

00:19:34.540 --> 00:19:37.280
The mice don't fluoresce, but the individual neurons.

00:19:37.280 --> 00:19:38.000
The neurons fluoresce.

00:19:38.000 --> 00:19:42.280
So when you're looking under a microscope, you see a bunch of different neurons.

00:19:42.400 --> 00:19:48.940
And depending on what type of neuron we're marking, that neuron will fluoresce under a microscope.

00:19:48.940 --> 00:19:56.760
So we have a lot of genetic control over recording from neurons that we kind of know what kind of transgenic type they are.

00:19:57.140 --> 00:20:04.600
In my group or the group that I work on and the project I work on, we are looking at electrophysiology data.

00:20:04.600 --> 00:20:08.140
So what that means is you stick an electrode into a neuron.

00:20:08.440 --> 00:20:14.860
So these mice are sacrificed and you have slices of the brain tissue.

00:20:14.860 --> 00:20:16.100
And we also do this in human.

00:20:16.100 --> 00:20:23.800
We have a lot of agreements with the hospitals within the area where if they're excising part of the brain during a surgery,

00:20:23.800 --> 00:20:26.640
we will get that tissue and we'll record from those neurons also.

00:20:26.640 --> 00:20:31.080
So this is a nice project to kind of try to relate mice to humans.

00:20:31.080 --> 00:20:32.220
How similar are they?

00:20:32.220 --> 00:20:39.900
My first position, as I mentioned, I came from a physics background, was basically building a modeling pipeline.

00:20:39.900 --> 00:20:46.840
So you stick an electrode into a neuron, you inject current, and you record the voltage output.

00:20:46.840 --> 00:20:55.200
And then you try to come up with mathematical equations that will recreate the behavior of a neuron based on current injection, just like a circuit.

00:20:55.800 --> 00:21:10.680
And so we recently wrapped up this project where we were looking at, you know, how much specific mathematical equations are needed to reproduce the behavior of these neurons.

00:21:10.680 --> 00:21:13.940
And so this is all available on our website now.

00:21:13.940 --> 00:21:22.620
And the idea here is that when people are building larger scale networks, you want to use realistic spiking behavior of individual neurons.

00:21:22.620 --> 00:21:33.060
So now depending on the level of abstraction someone might want to use in a network that they're building, they can choose from this different range of abstraction that we have on our website.

00:21:33.060 --> 00:21:38.700
So first project was that, working and building a whole pipeline to do this all automated.

00:21:38.700 --> 00:21:39.800
So data is taken.

00:21:39.800 --> 00:21:42.420
It goes into our storage facility.

00:21:42.420 --> 00:21:44.040
I take the data.

00:21:44.040 --> 00:21:49.460
Well, then there's some QC algorithms that we built up to, you know, QC the data.

00:21:49.460 --> 00:21:55.940
And then I pull that data out, come up with algorithms, test them in a very machine learning type way.

00:21:55.940 --> 00:21:59.100
You know, you basically have a test set and a training set.

00:21:59.100 --> 00:22:03.500
And then the project I work on now is trying to figure out those components.

00:22:03.500 --> 00:22:10.160
So you inject current into one neuron and you measure the voltage that's happening on another neuron it's connected to.

00:22:10.160 --> 00:22:12.180
How complicated does that get?

00:22:12.280 --> 00:22:16.580
Is it kind of simple to some degree, like Newtonian mechanics?

00:22:16.580 --> 00:22:20.720
Is it crazy, like complex, dynamical, chaotic systems?

00:22:20.720 --> 00:22:23.000
Like, what are you working with here?

00:22:23.000 --> 00:22:28.980
So it depends on, so Nick will talk about this a little bit because he spent a lot of time building actual network models.

00:22:28.980 --> 00:22:35.040
But so what I do is, I would say it's relatively simple mathematical equations.

00:22:35.500 --> 00:22:46.320
There is also a level of models that are made that are what we call biologically realistic, where you try to model all of the ion channels in a neuron.

00:22:46.320 --> 00:22:48.700
So you have lots of different ion channels in a neuron.

00:22:48.700 --> 00:22:53.960
Calcium, sodium, potassium, lots of them.

00:22:53.960 --> 00:22:55.740
25 different, you know, many.

00:22:55.740 --> 00:22:57.640
25 to 100 different channels.

00:22:57.840 --> 00:23:05.380
And you actually try to model, like, the gates opening and closing and current flowing into the neuron.

00:23:05.380 --> 00:23:13.320
But we abstract away from that because we have found that that's not necessarily necessary to predict the spiking behavior of a neuron.

00:23:13.320 --> 00:23:18.300
But we also have those high level, or sorry, those very complex models, too.

00:23:18.300 --> 00:23:19.280
So it depends.

00:23:19.280 --> 00:23:19.600
Okay.

00:23:19.600 --> 00:23:20.000
Yeah.

00:23:20.000 --> 00:23:23.540
So you can look at it at different levels depending on how, while you're trying to ask questions.

00:23:23.540 --> 00:23:25.540
And, yeah, Nick, how about COVID?

00:23:25.620 --> 00:23:32.340
Well, I just wanted to jump in there and sort of highlight one of our Python packages that the Institute has been building over the past year.

00:23:32.340 --> 00:23:37.600
It's a package we call the BMTK, the Brain Modeling Toolkit.

00:23:37.600 --> 00:23:41.240
It's available on our website, and it's gone through sort of a soft release.

00:23:41.240 --> 00:23:57.580
It's a Python package, a Python wrapper around several neural simulators, like I was mentioning earlier, that allow researchers to construct and simulate neural circuits, like Corinne was saying, at a bunch of these different levels of biological realism.

00:23:58.300 --> 00:24:04.380
So, you know, Corinne was highlighting some of her work at the sort of one differential equation per neuron type scale.

00:24:04.380 --> 00:24:17.260
But you can go much deeper and simulate individual compartment models that can resolve the complicated morphologies of the dendritic trees and how those interact with each other.

00:24:17.500 --> 00:24:24.940
Or even all the way down to the synapse with these stochastic models of ion channels, that's at the sort of extreme end of biological realism.

00:24:24.940 --> 00:24:38.960
My first project at the Institute was actually on the other extreme, the so-called population density modeling, where we use partial differential equations to simulate entire populations as sort of one homogenous group.

00:24:38.960 --> 00:24:45.660
And there's different biological questions that you'd want to pose at different sort of points on this continuum.

00:24:45.660 --> 00:24:46.860
I'll give you an example.

00:24:46.860 --> 00:24:55.640
Is it important to simulate the exquisite, complicated nature of the trees of these neurons to understand their input-output properties?

00:24:55.640 --> 00:24:59.920
Well, if the answer is yes, then you're going to have to use a simulation tool like Neuron.

00:24:59.920 --> 00:25:11.820
Although it might be sufficient to just look at the spiking behavior down near the soma of the cell, in which case a simulation tool like Nest, which also has a Python API, would be more appropriate.

00:25:11.820 --> 00:25:27.640
If you just are interested in the sort of mean field dynamics, the population-to-population contributions to circuit dynamics, then the neural simulator that I wrote called DIP-D, which is actually in pure Python, would be the tool that would be the most appropriate.

00:25:27.640 --> 00:25:37.520
So we have a Python package that actually wraps all of these different levels of detail, so you can move sort of in between each of the different scales as your work is sort of demanding.

00:25:37.520 --> 00:25:46.220
Yeah, that sounds really interesting, because you might start a research project thinking, I'm going to look at one level, but realize, no, actually, we need to try to think about it differently.

00:25:46.220 --> 00:25:48.800
But you have the same API or something like that, right?

00:25:48.800 --> 00:25:49.500
Exactly.

00:25:49.500 --> 00:25:55.180
And there's a big switching cost associated with having to learn a whole set of tool chains.

00:25:56.020 --> 00:26:02.720
It was originally written that the simulator Nest had its own custom language for describing the network topologies.

00:26:02.720 --> 00:26:03.960
I think it was called SLI.

00:26:03.960 --> 00:26:07.500
I know Neuron has its own language called a Hawk file, right, Corinne?

00:26:07.500 --> 00:26:07.980
Yes.

00:26:07.980 --> 00:26:08.320
Yeah.

00:26:08.320 --> 00:26:11.440
But it also has a Python interpreter now.

00:26:11.640 --> 00:26:25.340
So if you're having to switch based on your biological question to a different type of simulation, and now you've got to learn yet another custom description language or modeling language, it really is taxing on the individual scientist.

00:26:25.340 --> 00:26:33.840
So that's why unified Python APIs that you can just sort of learn one language but still get the power of all of these simulation tools is really helpful.

00:26:33.840 --> 00:26:34.880
Yeah, that sounds great.

00:26:34.880 --> 00:26:39.940
So are these tools and libraries being taught in academia these days?

00:26:39.940 --> 00:26:41.260
That's a great question.

00:26:41.260 --> 00:26:41.820
Are they used for research projects?

00:26:41.820 --> 00:26:48.360
So there are some examples where they're taught to undergraduates, although it's a pretty specific topic.

00:26:48.360 --> 00:26:52.520
And in graduate school, that's where I learned about all of these tools.

00:26:52.520 --> 00:27:01.320
But that's when you're doing your PhD and you have a neuroscience or an applied math question in mind, and then you go to find the tool that's most appropriate.

00:27:01.560 --> 00:27:05.340
So most of the time, I'd say, it's learning on your own.

00:27:05.340 --> 00:27:09.460
We do have several examples of training courses that we provide.

00:27:09.460 --> 00:27:17.540
They're actually not just at the Allen Institute but all over the world for sort of specialized computational neuroscience and also experimental neuroscience.

00:27:17.540 --> 00:27:28.780
And I know that at our summer training course last year, using these Python APIs was one of the main focuses of the course.

00:27:28.780 --> 00:27:30.920
Or it was a focus of the course.

00:27:31.060 --> 00:27:32.420
Are these courses taught online?

00:27:32.420 --> 00:27:33.440
Are they taught in Seattle?

00:27:33.440 --> 00:27:34.780
That's where you guys are?

00:27:34.780 --> 00:27:35.100
Yeah.

00:27:35.100 --> 00:27:38.560
So we're all in Seattle, just down here on Lake Union.

00:27:38.560 --> 00:27:51.480
But I was referring to our Friday Harbor summer course, which is actually, I think, in its fifth year now, where there's an application process really geared towards graduate students and postdocs, maybe early faculty.

00:27:52.140 --> 00:27:57.320
And it's a two-week in residence up in San Juan Islands at the Friday Harbor Labs, which is run by UW.

00:27:57.320 --> 00:27:59.120
That's the Friday Harbor tie-in.

00:27:59.120 --> 00:27:59.440
Yeah.

00:27:59.440 --> 00:28:01.360
I've stayed up there on San Juan Island.

00:28:01.360 --> 00:28:01.600
Oh, yeah.

00:28:01.600 --> 00:28:02.300
It's wonderful up there.

00:28:02.300 --> 00:28:02.560
Yeah.

00:28:02.560 --> 00:28:06.680
It's a combined effort between the University of Washington and Allen Institute.

00:28:06.680 --> 00:28:10.860
There's also funding from the Cavalier Institute that helps make sure that it is a success.

00:28:11.400 --> 00:28:21.580
And it's not only just computational neuroscience, but actually, I think it's kind of morphed into a combination of big data and experimental neuroscience.

00:28:22.140 --> 00:28:25.760
And there's also an introductory period where it's all taught in Python.

00:28:25.760 --> 00:28:39.380
So if the students are coming into the course without a strong Python background, there's a sort of a Python boot camp for the first couple days to help train students to use our data APIs and some of the tools that they might find useful on their projects.

00:28:39.380 --> 00:28:45.800
All of the course material from this from the last few years is on the Allen Institute GitHub repository.

00:28:45.800 --> 00:28:47.440
So there's Jupyter Notebooks.

00:28:48.220 --> 00:28:55.580
You know, one thing we also haven't – yeah, so there's Jupyter Notebooks that cover a lot of the course material that those students use.

00:28:55.580 --> 00:29:01.240
And that's freely available for folks to go and download and start poking around.

00:29:01.240 --> 00:29:08.300
This also – one thing that this also offers is, you know, I mentioned that we release a bunch of this data online.

00:29:08.300 --> 00:29:17.580
So currently we have what we call our brain observatory, which is observatory.brain-map.org.

00:29:18.140 --> 00:29:21.600
That is the website for the web version of it.

00:29:21.600 --> 00:29:32.920
And you can go there and you go to the website and you can kind of poke around and see a little bit of what is in this data set of, I want to say, 40,000 or so neurons from my work.

00:29:32.920 --> 00:29:36.740
Corinne's data is somewhere in a parallel.

00:29:36.740 --> 00:29:38.980
It's also on brain-map.org.

00:29:38.980 --> 00:29:40.320
I don't remember exactly where it is.

00:29:40.380 --> 00:29:44.240
But the stuff that she's been working on is also freely released.

00:29:44.240 --> 00:29:52.760
And Nick's team manages the API and Python wrapper for the API to access this.

00:29:52.840 --> 00:29:56.200
So you can basically do a pip install Allen SDK.

00:29:56.200 --> 00:29:59.260
That will install a Python package locally.

00:29:59.260 --> 00:30:08.520
You spin that up and in like three or four lines of Python code, you'll start downloading almost all the data that we've released in this.

00:30:08.700 --> 00:30:13.620
It might take a while, but you've got access to it at your fingertips.

00:30:13.620 --> 00:30:19.780
One of my thoughts around this is you talked about how much data that you're gathering for all of these projects and stuff.

00:30:19.780 --> 00:30:29.240
And I know the folks at CERN, instead of running their, downloading the data and running analysis on it, they push their analysis to where the data is.

00:30:29.440 --> 00:30:35.960
Because there's so much data, they've got like a cloud computing infrastructure that like send your algorithm to the data and run it locally.

00:30:35.960 --> 00:30:38.940
So with yours, what is it like?

00:30:38.940 --> 00:30:41.480
Do you actually download all the data and process it?

00:30:41.480 --> 00:30:43.480
Or do you download all segments that you ask for?

00:30:43.480 --> 00:30:44.080
Or how does it work?

00:30:44.080 --> 00:30:44.380
Yeah.

00:30:44.380 --> 00:30:50.180
So this specific tool is basically downloading after a lot of the pre-processing.

00:30:50.180 --> 00:31:01.960
And we've gotten it to a point that it's condensed to the level that your average postdoc or graduate student who would want to explore this data would want to play with it.

00:31:01.960 --> 00:31:05.260
That's about at the point that we release it for download.

00:31:05.260 --> 00:31:15.200
There is, I mean, we have our own compute internally for a lot of our own data that relies on our cluster and where we keep everything very close to the compute.

00:31:15.200 --> 00:31:18.040
It really depends on the types of questions.

00:31:18.040 --> 00:31:26.400
We have an entirely different chunk of the institute that is not represented here that is taking, doing very dense microscopy.

00:31:26.400 --> 00:31:32.300
So trying to build out, it's going to take them months to acquire the data set alone.

00:31:32.300 --> 00:31:34.040
This is an electron microscopy.

00:31:34.040 --> 00:31:39.680
So it's in every single neuron within some area in incredible detail.

00:31:39.900 --> 00:31:50.540
And the size of that data is just, you know, it would be largely impossible not to do the analyses that need to be done on that without staying close to the data.

00:31:50.540 --> 00:31:51.160
That's really wild.

00:31:51.160 --> 00:31:53.840
What kind of questions are they trying to answer, Corinne, on that one?

00:31:53.840 --> 00:31:54.120
You know?

00:31:54.120 --> 00:31:56.140
Oh, I was just going to mention that.

00:31:56.140 --> 00:31:58.220
I mean, we're trying to answer all sorts of questions.

00:31:58.580 --> 00:32:03.420
We have been at Friday Harbor using Docker, too.

00:32:03.420 --> 00:32:07.420
And AWS have donated time, you know, just like they did at CERN.

00:32:07.420 --> 00:32:09.440
So last year at Friday Harbor.

00:32:09.440 --> 00:32:15.660
So we used to just give out a terabyte disk to everybody that showed up with the data that we were going to be talking about.

00:32:15.660 --> 00:32:18.020
It's the most efficient way to transmit the data sometimes.

00:32:18.260 --> 00:32:27.260
Well, you're joking, but I mean, Amazon has their, like, ship us a bunch of disks is the fastest way to upload large quantities of data to, like, S3 and stuff.

00:32:27.260 --> 00:32:27.860
It's wild.

00:32:27.860 --> 00:32:28.960
Sometimes that's what you need.

00:32:28.960 --> 00:32:30.400
So Amazon, yeah.

00:32:30.400 --> 00:32:35.440
So last year, Amazon generously donated a bunch of space and credits for us for this course.

00:32:35.440 --> 00:32:37.920
And so we, yeah, for all these students.

00:32:37.920 --> 00:32:41.560
And we, yeah, we had a snowball here that they sent us.

00:32:41.560 --> 00:32:45.920
And we put a bunch of data on it and sent it off to Amazon.

00:32:46.220 --> 00:32:52.200
Yeah, so one thing that's interesting about that data, when you download data from the Allen SDK, the Python API,

00:32:52.200 --> 00:33:00.000
what you're getting is a bunch of actually preprocessed data that has had a lot of computational algorithms already applied to it.

00:33:00.000 --> 00:33:05.400
For example, neuropil subtraction, segmentation of regions of interest for the cells.

00:33:05.400 --> 00:33:08.940
That basically gives time series for the activity of each cell.

00:33:08.940 --> 00:33:16.200
All of that is there's an entire algorithms team at the Institute that works on the packages and the algorithms to do that.

00:33:16.200 --> 00:33:24.200
When you access that data with the Python API, you're just getting that post-process data, not the raw imaging stacks.

00:33:24.200 --> 00:33:29.020
Those are the sort of multi-hundred gigabyte or terascale data.

00:33:29.020 --> 00:33:30.740
And that's what we need the snowball for.

00:33:30.740 --> 00:33:41.200
Because we actually brought that to, made that data available actually through Amazon at Friday Harbor last year for students to sort of poke at that raw imaging data.

00:33:41.340 --> 00:33:44.380
But there's really, really significant volume.

00:33:44.380 --> 00:33:50.160
And also, if anybody requests it, they do send us a disk and we put the data on it and send it back to them.

00:33:50.160 --> 00:33:52.980
That's another way that we handle our large data sets.

00:33:52.980 --> 00:33:53.460
Oh, wow.

00:33:53.460 --> 00:33:53.760
Okay.

00:33:53.760 --> 00:33:54.800
Yeah, that's really interesting.

00:33:54.800 --> 00:33:58.460
Because downloading a terabyte, like that's going to cause all kinds of problems.

00:33:58.600 --> 00:34:03.640
I mean, even just paying for that much bandwidth, that's like $90 of bandwidth at AWS.

00:34:03.640 --> 00:34:07.200
And it also begs the question what you're going to do with that data when you get it.

00:34:07.200 --> 00:34:17.260
I mean, I'm not saying that a researcher wouldn't know what to do with it, but it takes a lot of time and a lot of effort to extract signal out of that data.

00:34:17.820 --> 00:34:29.000
And that's sort of, I wouldn't call it a service we provide, but it's part of our institutional work to develop the algorithms to do that so that people don't have to retread that wheel constantly.

00:34:29.000 --> 00:34:29.500
Right.

00:34:29.500 --> 00:34:32.280
Just pay the computational cost of trying to compute.

00:34:32.280 --> 00:34:33.980
That's got to be pretty high.

00:34:33.980 --> 00:34:36.720
A computational cost, I also think it's the human cost.

00:34:36.720 --> 00:34:46.920
It takes a very specialized set of skills to be able to computationally extract the meaningful data in those raw imaging stacks.

00:34:46.920 --> 00:34:59.420
But we have a really world-class algorithms team that does a lot of that pre-processing for you so you can jump straight into the sort of the data that you might think of as the really relevant class of data.

00:34:59.420 --> 00:35:03.380
What is the activity of the cell, not what was seen by the microscope?

00:35:03.380 --> 00:35:05.800
Those are two different data dimensionalities.

00:35:05.800 --> 00:35:06.060
Thanks, man.

00:35:06.060 --> 00:35:12.120
This portion of Talk Python To Me has been brought to you by Rollbar.

00:35:12.120 --> 00:35:15.800
One of the frustrating things about being a developer is dealing with errors.

00:35:15.800 --> 00:35:24.800
Relying on users to report errors, digging through log files, trying to debug issues, or getting millions of alerts just flooding your inbox and ruining your day.

00:35:24.800 --> 00:35:31.560
With Rollbar's full-stack error monitoring, you get the context, insight, and control you need to find and fix bugs faster.

00:35:31.560 --> 00:35:35.560
Adding Rollbar to your Python app is as easy as pip install Rollbar.

00:35:35.560 --> 00:35:39.780
You can start tracking production errors and deployments in eight minutes or less.

00:35:39.780 --> 00:35:43.960
Are you considering self-hosting tools for security or compliance reasons?

00:35:43.960 --> 00:35:47.540
Then you should really check out Rollbar's compliant SaaS option.

00:35:47.540 --> 00:35:56.920
Get advanced security features and meet compliance without the hassle of self-hosting, including HIPAA, ISO 27001, Privacy Shield, and more.

00:35:56.920 --> 00:35:58.300
They'd love to give you a demo.

00:35:58.300 --> 00:36:00.000
Give Rollbar a try today.

00:36:00.000 --> 00:36:03.680
Go to talkpython.fm/Rollbar and check them out.

00:36:03.680 --> 00:36:08.000
How many people work there at the Allen Brain Institute?

00:36:08.000 --> 00:36:15.340
I think we're pushing 400 or so totally in brain science, which is our kind of corner of the Allen Institute as a whole.

00:36:15.340 --> 00:36:20.600
We're the largest chunk, and I think we're closer to 300, 250, or 300, somewhere in that fall.

00:36:20.600 --> 00:36:24.560
Yeah, so a lot of expertise packaged in that area, right?

00:36:24.560 --> 00:36:31.740
So, Corinne, you talked about people shipping you disks and sharing that data in some really interesting ways.

00:36:31.740 --> 00:36:37.740
And I think that leads into one of the missions at the Institute, which I thought was really powerful.

00:36:37.740 --> 00:36:42.300
It says you guys are committed to the open science model within your institutes.

00:36:42.300 --> 00:36:43.640
Do you want to speak to that a little?

00:36:43.900 --> 00:36:44.440
Absolutely.

00:36:44.440 --> 00:36:56.640
Yeah, so in academia, things are generally done in smaller labs, and oftentimes you have a lot of difficulty reproducing individual experiments that happen there.

00:36:57.320 --> 00:37:02.640
And I believe a lot of brain science is really hard.

00:37:02.640 --> 00:37:06.140
It's really hard to figure out what's going on in the brain.

00:37:06.140 --> 00:37:21.580
And I believe that when this project was started, it was like, what space is really not being covered by the neuroscience community, academia, and pharmaceuticals combined?

00:37:22.420 --> 00:37:41.100
And that was being able to reproduce large sets of data, making them standardized, and making everybody be able to reproduce the results that you get so that you could kind of come to an agreement on ground truth and not be trying to reproduce other people's experiments.

00:37:41.100 --> 00:37:48.300
And so we are one of the only institutions in that space that has done this.

00:37:48.300 --> 00:38:02.140
And now other institutions are also trying to kind of follow suit because we've all recognized that we're really trying to solve this reproducibility problem and also deal with just the huge amounts of data in the brain.

00:38:02.140 --> 00:38:02.960
Yeah.

00:38:02.960 --> 00:38:07.740
How many different data centers have to be set up to do basically the same processing, right?

00:38:07.740 --> 00:38:18.360
Like, if you guys could do that processing, share that data, and not have every university set up their own equivalent computing structure to do the same thing, that would be good, right?

00:38:18.360 --> 00:38:18.900
Exactly.

00:38:18.900 --> 00:38:28.900
And I know that the next step in a process now is that we have projects where you can apply to have your scientific study done in our platforms.

00:38:29.320 --> 00:38:31.740
So we might be heading more towards that.

00:38:31.740 --> 00:38:33.660
You know, we've set up this huge infrastructure.

00:38:33.660 --> 00:38:38.480
People will apply to have what they think is interesting done here.

00:38:38.480 --> 00:38:38.780
Nice.

00:38:38.780 --> 00:38:41.580
It's a little bit of the computing close to the data type of thing.

00:38:41.580 --> 00:38:42.120
Yeah.

00:38:42.120 --> 00:38:42.540
Yeah.

00:38:42.540 --> 00:38:47.520
You know, there's a lot of different dimensions to open science, right?

00:38:47.520 --> 00:39:02.120
We've been talking a lot about open data, and we've talked a little about open software, which is open source software, which is another aspect nowadays of open science, as science has really kind of come to depend upon the software that implements the science.

00:39:02.120 --> 00:39:06.000
And then there's also, of course, open access, right?

00:39:06.000 --> 00:39:12.840
That's what we traditionally think of as the final work product of science is the paper that you publish.

00:39:13.040 --> 00:39:16.920
And there's free preprint archives and free access on journals.

00:39:16.920 --> 00:39:22.100
There's a lot of different – so my point is there's a lot of different dimensions that you can talk about open science.

00:39:22.100 --> 00:39:27.160
But open source software is something that myself and the technology team really think about a lot.

00:39:27.160 --> 00:39:31.880
You know, we've talked about the Allen SDK, and I also mentioned the Brain Modeling Toolkit.

00:39:31.880 --> 00:39:39.880
I don't think I necessarily mentioned that both of these are open source packages, and we accept pull requests onto them, and we respond to GitHub issues.

00:39:39.880 --> 00:39:47.360
And there's a large backlog because there's a lot of work to do, and we support a lot of different scientific projects at a very large scale.

00:39:47.840 --> 00:39:53.120
But open source software development is really something that the Institute has really come to embrace.

00:39:53.120 --> 00:40:03.300
I think the community has started to recognize just how critical it is to share our algorithms, share our processing code, share our analysis tools.

00:40:03.300 --> 00:40:05.560
Yeah, I think that's such a great mission.

00:40:05.720 --> 00:40:20.100
And I think partly you folks have a slight advantage over, say, Stanford, Rutgers, the other universities because you don't depend on the publish or peril model, right?

00:40:20.100 --> 00:40:22.240
At least that's my understanding from the outside, right?

00:40:22.240 --> 00:40:22.840
It's not like –

00:40:22.840 --> 00:40:24.120
No, that's exactly –

00:40:24.120 --> 00:40:24.340
Yeah.

00:40:24.340 --> 00:40:32.000
Yeah, I mean, it is very important for us still to publish our data and to still communicate to the broader scientific public in that realm.

00:40:32.000 --> 00:40:35.600
But yeah, our incentives are a little bit different.

00:40:35.600 --> 00:40:40.520
I mean, I think a good example of this is our brain observatory.

00:40:40.520 --> 00:40:51.160
We started releasing data in the brain observatory in May 2016, and we've had two or three more releases for this.

00:40:51.160 --> 00:40:53.700
This is the data set that I was talking about.

00:40:53.700 --> 00:40:56.160
It's 40,000-some-odd neurons.

00:40:56.160 --> 00:40:59.160
We haven't published a paper on this yet.

00:40:59.160 --> 00:41:14.500
So whereas, like, the rest of – even in the open science community and open data community, the debates right now are, okay, well, how soon after publishing the paper should we release the data?

00:41:14.500 --> 00:41:16.500
Do we release it immediately on publication?

00:41:16.500 --> 00:41:24.560
Do we wait six months or a year to give the primary author time to write a second publication before they move forward?

00:41:24.560 --> 00:41:29.720
And these are important to base because that's the way that the incentive structures are in academia.

00:41:29.720 --> 00:41:32.940
Nobel Prizes are handed out on this basis, things like that, right?

00:41:32.940 --> 00:41:33.140
Exactly.

00:41:33.140 --> 00:41:41.540
We've released this data, and there are already 12 preprints that external people have written and posted.

00:41:41.540 --> 00:41:46.160
I think two of them are peer-reviewed and published also on the data already.

00:41:46.160 --> 00:41:49.580
And we haven't even written our own paper analyzing this yet.

00:41:49.580 --> 00:41:50.540
It's in the process.

00:41:50.660 --> 00:41:54.700
We have not yet published our own paper dealing with this data.

00:41:54.700 --> 00:42:05.000
So, yeah, so we have a very strong kind of data-first, publish-later model that you just can't do in the current academic infrastructure.

00:42:05.000 --> 00:42:08.940
I mean, there are communities that have sort of demonstrated that this is possible.

00:42:09.060 --> 00:42:17.280
The machine learning community is the one that I always think of as really jumping out there early with the latest developments of the algorithms and the approaches.

00:42:17.280 --> 00:42:19.160
And they are starting to make it.

00:42:19.160 --> 00:42:28.760
But I think it's a real big culture change for sort of more, I don't want to say entrenched, but, you know, the biological sciences have been around for a long time.

00:42:28.760 --> 00:42:32.860
And the publication methods are the way they are for a reason.

00:42:32.860 --> 00:42:39.280
But there's a cultural change that the three of us have only been out of our PhDs now for less than a decade.

00:42:39.280 --> 00:42:43.220
And we are definitely sort of seeing this enormous change.

00:42:43.220 --> 00:42:46.760
And it's fun to be at a place that really sort of embraces that cultural change.

00:42:46.760 --> 00:42:49.740
Yeah, you guys seem to be at sort of the leading edge of that.

00:42:49.740 --> 00:43:00.380
Yeah, and I'd like to say just quickly that we should acknowledge that this is all possible because of Paul Allen's generous donation to us.

00:43:00.380 --> 00:43:02.580
I mean, he basically makes all of this possible.

00:43:02.580 --> 00:43:07.340
And it's really the only place in the world that you can do this.

00:43:07.340 --> 00:43:09.720
So kudos to him and his vision.

00:43:09.720 --> 00:43:10.520
Yeah, absolutely.

00:43:10.520 --> 00:43:14.100
He's got the Brain Institute.

00:43:14.100 --> 00:43:17.200
And there's a couple of other ones as well, right?

00:43:17.200 --> 00:43:22.880
Now you also have the Allen Institute for Cellular Science and the Allen Frontiers Group.

00:43:22.880 --> 00:43:25.040
And the Allen Institute for AI.

00:43:25.040 --> 00:43:25.680
Oh, really?

00:43:25.680 --> 00:43:26.040
Okay.

00:43:26.040 --> 00:43:28.080
Yeah, they're not hosted in our building.

00:43:28.080 --> 00:43:29.180
They're in a separate building.

00:43:29.180 --> 00:43:32.220
But yeah, there's an artificial intelligence group.

00:43:32.220 --> 00:43:34.140
Yeah, so that's, yeah, absolutely.

00:43:34.140 --> 00:43:39.520
It's great to acknowledge what he's doing because it sounds really unique and special.

00:43:39.520 --> 00:43:49.320
And like we were just talking, it lets a lot of you work in a way that is sort of a better fit for the larger goal, not sort of career goals necessarily, right?

00:43:49.320 --> 00:43:49.660
Yeah.

00:43:49.840 --> 00:44:02.840
And just to follow back on that too is that we do want to make sure that we publish and we do have external advisory boards and we do apply for grants because we want to make sure that what we're doing in our space is relevant to the community.

00:44:02.840 --> 00:44:08.680
We don't want to be in this one-off ball where we somehow discover something that's not relevant.

00:44:08.680 --> 00:44:09.460
Or you know what I mean?

00:44:09.460 --> 00:44:12.380
Like we want to make sure that what we're doing is valid.

00:44:12.380 --> 00:44:16.620
Yeah, so the whole peer review process is still pretty interesting.

00:44:16.620 --> 00:44:18.060
Yeah, still very valid for us.

00:44:18.160 --> 00:44:18.300
Yeah.

00:44:18.300 --> 00:44:20.200
Right, exactly.

00:44:20.200 --> 00:44:29.580
So in 2013, President Obama came out with the BRAIN initiative, or BRAIN stands for actually an acronym as well.

00:44:29.580 --> 00:44:31.000
Like how did that affect you guys?

00:44:31.000 --> 00:44:33.920
Did it make any difference or the community?

00:44:33.920 --> 00:44:43.080
Yeah, I think that I'm pretty sure that up in our cafe there's a little, I think I'm pretty sure that we've got a little letter signed by President Obama hanging on the wall up there.

00:44:43.080 --> 00:44:44.640
That's pretty cool.

00:44:44.640 --> 00:44:45.600
Related to these efforts.

00:44:45.600 --> 00:44:46.640
Yeah.

00:44:46.640 --> 00:45:02.300
So in effect, I mean, it's just great to see that more and more people are seeing how important this is, especially from a government point of view that, you know, a president has realized that, you know, this is one of the biggest frontiers that we really have left to solve.

00:45:02.380 --> 00:45:04.300
I mean, we just, we know so little.

00:45:04.300 --> 00:45:04.720
Yeah.

00:45:04.720 --> 00:45:09.280
And so it's really great to see that the community as a whole is investing in this sort of research.

00:45:09.280 --> 00:45:23.800
Yeah, from my perspective, I feel like that was one of the most important things was a recognition at the federal, nation-state level that BRAIN science is maturing into the type of thing that needs large data centers.

00:45:23.800 --> 00:45:26.640
It needs large sharing and collaboration tools.

00:45:26.860 --> 00:45:32.180
It needs large investigations to really start to make a difference in people's lives.

00:45:32.180 --> 00:45:41.360
And we've, the science, the community has matured to the point where we can really start taking that standard forward and making a big impact.

00:45:41.360 --> 00:45:42.400
Yeah, that's awesome.

00:45:42.520 --> 00:45:49.280
And with the more open science and open source projects, it seems like that'll just amplify as people can work better together.

00:45:49.280 --> 00:45:52.400
To be honest, we couldn't do it without the open source software community.

00:45:52.400 --> 00:45:54.940
I was reflecting on this just the other day.

00:45:54.940 --> 00:45:56.640
I don't have the name of the tweet.

00:45:56.640 --> 00:46:08.020
I know a researcher with a first name, Jesse, tweeted a picture emphasizing that Matplotlib, Numpy, and Pandas collectively are supported by 12 full-time developers.

00:46:08.480 --> 00:46:24.100
And when I think about the amount of science, both my own science, our institute's science, neuro, and then, you know, the rest of science generally, it's a huge edifice of work that's so critical to our national interest and to our interest as humans.

00:46:24.100 --> 00:46:32.680
And to see that much work supported by so few people, but such a dedicated core group of people, it really struck a chord with me.

00:46:32.920 --> 00:46:39.100
It's amazing to think of how just these small initiatives became such a foundation for what everyone is doing.

00:46:39.100 --> 00:46:49.280
I was happy to see that the NSF recently gave like a $3 million grant or something like that to the SciPy group and NumPy and that.

00:46:49.280 --> 00:46:51.780
So it's starting to get a lot more support.

00:46:51.780 --> 00:46:52.380
NumFocus?

00:46:52.720 --> 00:47:01.400
Yeah, I might have the exact details a little bit off like the number or whatever, but there was basically a big NSF grant to those groups to like keep that going stronger.

00:47:01.400 --> 00:47:03.500
Because I think they realize exactly what you're saying.

00:47:03.500 --> 00:47:12.720
Like all of these researchers, all these data scientists are day to day to day going, yeah, we have a terabyte of mouse video and we're going to give it to scikit-learn.

00:47:13.000 --> 00:47:19.020
Or we're going to, you know, and then like, well, we need to make sure that scikit-learn keeps working or the foundations, right?

00:47:19.020 --> 00:47:28.780
I saw, I remember seeing the, it was a call for some funding from Ken Reitz for requests, you know, the Python request module.

00:47:28.780 --> 00:47:30.900
Just, it was like, it was like a month ago.

00:47:30.900 --> 00:47:32.740
And it was like, oh, our goal is $3,000.

00:47:32.740 --> 00:47:36.220
And I was like, $3,000 to support requests?

00:47:36.220 --> 00:47:40.180
Like the number one most used Python module in the entire ecosystem?

00:47:40.580 --> 00:47:41.840
Man, that's like pennies.

00:47:41.840 --> 00:47:43.200
Think about the return on that.

00:47:43.200 --> 00:47:44.460
That is pennies on the dollar.

00:47:44.460 --> 00:47:45.840
It's unbelievable.

00:47:45.840 --> 00:47:46.200
Yeah.

00:47:46.200 --> 00:47:48.660
I mean, I don't know the exact numbers.

00:47:48.660 --> 00:47:49.720
He had it on his website.

00:47:49.720 --> 00:47:54.100
It's probably still there, but it's like downloaded like 7 million times a month or something.

00:47:54.100 --> 00:47:55.500
I mean, it's really useful.

00:47:55.500 --> 00:47:56.500
It's not just a little bit.

00:47:56.500 --> 00:48:10.560
It's also very nice to see that funding organizations as a whole are starting to recognize too how much, I mean, one of the things that we've struggled with is just how much infrastructure goes into doing large data analysis.

00:48:10.560 --> 00:48:15.020
And from 10 years ago, this wasn't a thing.

00:48:15.020 --> 00:48:32.760
You know, and now, like Nick was saying, that we're all in our, you know, within a decade of our PhDs and we're all really working really hard to figure out how to process large sets of data, how to make that work, how to, you know, transfer our research code over to more production-like code and just how much infrastructure that takes.

00:48:33.120 --> 00:48:37.460
Whereas people don't, a lot of times don't understand that actually.

00:48:37.460 --> 00:48:49.520
So it's really nice to see that, you know, government funding organizations are starting to realize because the people on the ground doing this work are really like shouting, you know, we need the resources for this.

00:48:49.520 --> 00:48:50.520
This takes a lot of time.

00:48:50.520 --> 00:48:54.220
And this is so fundamental for, you know, the mission of this science.

00:48:54.920 --> 00:48:57.920
Do you think people are being taught those skills in grad school these days?

00:48:57.920 --> 00:48:59.260
Your question is so appropriate.

00:48:59.260 --> 00:49:01.600
I don't have my finger firmly on the pulse.

00:49:01.600 --> 00:49:10.000
I can only speak to my sort of small window into the University of Washington where I still maintain some contacts with my former advisor and his research group.

00:49:10.000 --> 00:49:13.780
And I know it's definitely on the minds.

00:49:13.780 --> 00:49:19.780
And I know at the graduate level, like the University of Washington's e-science group, they talk about these types of things.

00:49:19.780 --> 00:49:24.540
Yeah, Jake Fenderplast and the group over there doing the e-science institute, that's something special as well.

00:49:24.540 --> 00:49:25.640
That's a really cool place.

00:49:25.640 --> 00:49:26.120
Exactly.

00:49:26.280 --> 00:49:29.540
But I think UW is really forward thinking in that respect.

00:49:29.540 --> 00:49:33.440
As far as its adoption, I'll maybe give a personal anecdote to drive it home.

00:49:33.440 --> 00:49:40.920
When I came to the Allen Institute in 2012, I just finished my PhD and I took a research science position.

00:49:40.920 --> 00:49:47.380
And I quickly realized the incredible amount that I can learn from some of my colleagues in the technology team.

00:49:47.380 --> 00:49:49.560
I'd never heard of a unit test.

00:49:49.560 --> 00:49:51.100
Like, we're going to test code?

00:49:51.100 --> 00:49:51.540
What?

00:49:51.540 --> 00:49:52.580
Why did you test code?

00:49:52.580 --> 00:49:53.000
It works.

00:49:53.000 --> 00:49:54.380
I just did my script in the works.

00:49:54.380 --> 00:49:55.300
Yeah, I did that myself.

00:49:55.300 --> 00:49:55.660
Exactly.

00:49:56.180 --> 00:50:01.140
And then I sort of got my eyes open to the way that they approached their work.

00:50:01.140 --> 00:50:05.820
And then I sort of fell into the well and have joined the group eventually.

00:50:05.820 --> 00:50:13.800
But I definitely want to see the same sort of epiphany happen to scientists, not only in the graduate level, also the undergraduates.

00:50:13.800 --> 00:50:19.440
You know, especially what I've seen in the talent of some of these undergraduate scientists that are coming out,

00:50:19.440 --> 00:50:24.420
that are getting trained in the disciplines that didn't exist when I was an undergraduate.

00:50:25.120 --> 00:50:28.820
And to see them start to take the tools that have been developed, you know, really for industry,

00:50:28.820 --> 00:50:36.360
and of course, we have visibility in open source software, and take those and apply those to their research and build it into the DNA of how they work and think.

00:50:36.760 --> 00:50:42.100
That's only going to amplify how open source software contributes to science for the next generation.

00:50:42.100 --> 00:50:43.820
Yeah, it's exciting.

00:50:43.820 --> 00:50:48.340
My lab that I came out of, I finished on my PhD about two years ago.

00:50:49.040 --> 00:50:58.440
And we signed up for, the lab signed up for a GitHub account and started doing version control, you know, shortly after I switched over to Python.

00:50:58.440 --> 00:51:07.640
It was sort of myself and another student who had done an internship at Google between undergrad and grad school that sort of,

00:51:07.640 --> 00:51:10.560
between the two of us, you know, he kind of brought some best practices.

00:51:10.560 --> 00:51:12.760
He's like, well, this is the way, you know, they did things.

00:51:13.040 --> 00:51:15.600
And started getting stuff under version control.

00:51:15.600 --> 00:51:18.420
And I still get pings on, like, changes to that repo.

00:51:18.420 --> 00:51:22.740
And we kind of laid a foundation in that lab that has continued.

00:51:22.740 --> 00:51:27.040
I mean, that group, you know, it's a whole new, I think he's still there, but, you know,

00:51:27.040 --> 00:51:31.960
there's a whole new kind of cohort of students in that lab that I didn't know.

00:51:32.080 --> 00:51:39.100
And they're doing research code development there in a very different way than when I entered that exact same lab.

00:51:39.100 --> 00:51:40.840
And to them, that must be how you do it.

00:51:40.840 --> 00:51:41.500
That's how we do it.

00:51:41.500 --> 00:51:42.160
Yeah, exactly.

00:51:42.160 --> 00:51:42.540
Yeah.

00:51:42.540 --> 00:51:43.440
Yeah, that's cool.

00:51:43.440 --> 00:51:44.360
All right.

00:51:44.360 --> 00:51:49.500
Well, I want to be cognizant of your time, but I could keep talking for a long time because there's so many things to explore.

00:51:49.500 --> 00:51:50.060
Well, let's do it again.

00:51:50.060 --> 00:51:50.940
Yeah, yeah.

00:51:50.940 --> 00:51:52.060
We could do a follow-up sometime.

00:51:52.060 --> 00:51:52.540
Absolutely.

00:51:52.540 --> 00:51:54.860
But this is super fascinating.

00:51:54.860 --> 00:51:57.500
I think we'll leave it there for the brain science aspect.

00:51:57.500 --> 00:52:01.300
So let me just ask you all the two questions at the end of the show.

00:52:01.300 --> 00:52:03.240
And since there's three of you, I'll go kind of quick.

00:52:03.240 --> 00:52:07.820
First of all, if you're going to write some code on Python, Justin, what editor do you use?

00:52:07.820 --> 00:52:09.260
I'm loving Atom right now.

00:52:09.260 --> 00:52:13.180
I kind of prototype in Jupyter, in JupyterLab actually now.

00:52:13.180 --> 00:52:14.560
Yeah, yeah, that's starting to take off.

00:52:14.560 --> 00:52:14.860
Yeah.

00:52:14.860 --> 00:52:18.440
And then, but yeah, then I'm, most of my actual packages are in Atom.

00:52:18.440 --> 00:52:18.740
Nice.

00:52:18.740 --> 00:52:19.220
Corinne?

00:52:19.220 --> 00:52:23.960
I'll either do some old school writing it in Emacs and then running it on the command line,

00:52:23.960 --> 00:52:28.520
or if I'm using an editor with a debugger, I'll use Eclipse with PyDev.

00:52:28.520 --> 00:52:29.340
Oh, yeah.

00:52:29.340 --> 00:52:29.680
Nice.

00:52:29.680 --> 00:52:30.420
Right on.

00:52:30.680 --> 00:52:31.120
Nick?

00:52:31.120 --> 00:52:36.920
Emacs when I'm remote and VS Code when I'm Visual Studio, when I'm local.

00:52:36.920 --> 00:52:37.300
Yeah.

00:52:37.300 --> 00:52:38.900
Visual Studio Code's really taken off.

00:52:38.900 --> 00:52:43.540
I've just been tremendously impressed, especially as it's matured as an open source project.

00:52:43.540 --> 00:52:48.420
And when the updates come in, they are timely and they are squashing bugs that people report.

00:52:48.420 --> 00:52:50.480
It's been awesome to watch that project.

00:52:50.480 --> 00:52:51.900
It's got a lot of momentum for sure.

00:52:51.900 --> 00:52:52.320
Yeah.

00:52:52.320 --> 00:52:52.740
All right.

00:52:52.740 --> 00:52:53.160
That's good.

00:52:53.160 --> 00:52:55.720
And then, notable PyPI package.

00:52:55.720 --> 00:53:00.140
Maybe there's some package that people, you know, not necessarily request because it's the most popular,

00:53:00.140 --> 00:53:03.040
but something you're like, oh, I saw this thing the other day and it's really amazing.

00:53:03.040 --> 00:53:04.140
You should know about it.

00:53:04.140 --> 00:53:04.800
Nick?

00:53:04.800 --> 00:53:05.560
Go reverse order.

00:53:05.760 --> 00:53:06.160
Oh, man.

00:53:06.160 --> 00:53:08.080
Let me take a pass and I'll come back at the end.

00:53:08.080 --> 00:53:09.260
I want to think for you in a second.

00:53:09.260 --> 00:53:09.900
You want to get a good one.

00:53:09.900 --> 00:53:10.220
All right.

00:53:10.220 --> 00:53:10.700
Corinne?

00:53:10.700 --> 00:53:20.600
Well, I really use very standardized packages just because I want to stay away from people having to install and use unmature code.

00:53:20.600 --> 00:53:26.140
And the things I use the most are numpy, scipy, stats package, stuff like that.

00:53:26.140 --> 00:53:26.500
Chester?

00:53:26.500 --> 00:53:30.020
One of the things I've been impressed with recently is cookie cutter.

00:53:30.020 --> 00:53:38.960
It's kind of speaking of, you know, we work with a lot of, speaking of onboarding kind of newer Python folks into good research,

00:53:38.960 --> 00:53:49.760
good practices of testing, tooling, documentation, helping folks who have a little bit less knowledge of what a full-fledged package should look like

00:53:49.760 --> 00:53:52.460
with a nice template has been absolutely invaluable.

00:53:52.460 --> 00:53:53.520
Yeah, that's a great idea.

00:53:53.520 --> 00:53:58.120
It's very, very helpful to just run a single command and poof, you got all the structure you're supposed to have.

00:53:58.120 --> 00:53:58.900
Nick, you thought of one?

00:53:58.900 --> 00:53:59.820
I got one, yeah.

00:53:59.820 --> 00:54:00.820
Bokeh.

00:54:00.820 --> 00:54:01.960
Oh, yeah.

00:54:01.960 --> 00:54:04.820
It's the continuum visualization package.

00:54:04.820 --> 00:54:12.520
I've been using it to build dashboards and widgets for doing analysis tooling and I just can't say enough about it.

00:54:12.520 --> 00:54:19.720
The community that has grown up around it has just been so responsive and the power of that tool is it matures into the 1.0 release.

00:54:20.140 --> 00:54:23.800
I'm just so excited to see where it goes because I use it daily and I love it.

00:54:23.800 --> 00:54:24.720
Yeah, that's awesome.

00:54:24.720 --> 00:54:25.160
All right.

00:54:25.160 --> 00:54:26.460
Well, thank you so much.

00:54:26.460 --> 00:54:27.620
Those are all great choices.

00:54:27.620 --> 00:54:32.240
I guess I'll give you all, whoever wants to jump in and add something here, a chance for a final call to action.

00:54:32.240 --> 00:54:37.820
People want to work with the Paul Allen Brain Institute or get involved with some of the tools or things you've talked about.

00:54:37.820 --> 00:54:38.820
What do they do?

00:54:38.820 --> 00:54:39.320
Where do they go?

00:54:39.420 --> 00:54:43.920
Yeah, so we've got, so I mean, I think definitely for your users, I would, or for your users.

00:54:43.920 --> 00:54:46.280
Listeners.

00:54:46.280 --> 00:54:57.320
So definitely for your listeners, I think that our GitHub page, so we've got a, we've got a github.com/Allen Institute and we've got a bunch of different packages.

00:54:57.320 --> 00:55:06.620
Everything from our, our production things like the Allen SDK to smaller packages that the individual people are releasing like Neuroglia, Arc Schema.

00:55:06.620 --> 00:55:14.500
We've got a couple of kind of things in the, in the Python world as well as, you know, research code and packages that are affiliated with research projects.

00:55:14.500 --> 00:55:16.960
So there's a bunch of stuff there that's a whole lot of Python.

00:55:16.960 --> 00:55:17.360
Nice.

00:55:17.360 --> 00:55:32.140
So our GitHub page will have lots of great examples of how to actually utilize the data that you can download too, which if you want to browse around on the data, go to our website and you can see the massive plethora of data that we have there that's available for everybody.

00:55:32.140 --> 00:55:33.000
Yeah, excellent.

00:55:33.000 --> 00:55:44.220
And one particular package, because it's so close to my heart, the Allen SDK, it's really the sort of one-stop shop to get your hands dirty digging into our data and it should work.

00:55:44.220 --> 00:55:45.000
Just pip install.

00:55:45.000 --> 00:55:50.080
And if it doesn't check out an issue and assign it to me, I'll, I'll tackle it as soon as I can.

00:55:50.080 --> 00:55:50.320
Yeah.

00:55:50.320 --> 00:55:54.580
And we've got any of our research that we've got going on here, Twitter, we're on Instagram.

00:55:54.580 --> 00:56:01.760
We've got a bunch of job openings too, I think for some software developers, AllenSuit.com and there's a button somewhere for careers.

00:56:02.300 --> 00:56:04.620
So yeah, there's a, there's a lot of fun stuff happening.

00:56:04.620 --> 00:56:04.940
Awesome.

00:56:04.940 --> 00:56:05.220
Yeah.

00:56:05.220 --> 00:56:06.380
It sounds super exciting.

00:56:06.380 --> 00:56:09.600
Thank you for sharing this view into what you're all up to.

00:56:09.600 --> 00:56:09.920
Yeah.

00:56:09.920 --> 00:56:11.220
Thank you so much for having us.

00:56:11.220 --> 00:56:11.880
Thank you so much.

00:56:11.880 --> 00:56:12.860
Thank you for having us.

00:56:12.860 --> 00:56:13.040
Bye.

00:56:13.040 --> 00:56:17.820
This has been another episode of Talk Python To Me.

00:56:17.820 --> 00:56:22.300
Our guests on this episode have been Justin Kiggins, Corinne Teeter, and Nicholas Kane.

00:56:22.300 --> 00:56:26.180
And this episode has been brought to you by Cox Automotive and Rollbar.

00:56:26.180 --> 00:56:31.600
Join Cox Automotive and use your technical skills to transform the way the world buys,

00:56:31.600 --> 00:56:32.860
sells, and owns cars.

00:56:32.860 --> 00:56:39.380
Find an exciting technical position that's right for you at talkpython.fm/cox, C-O-X.

00:56:39.380 --> 00:56:42.280
Rollbar takes the pain out of errors.

00:56:42.280 --> 00:56:47.360
They give you the context and insight you need to quickly locate and fix errors that might have

00:56:47.360 --> 00:56:50.000
gone unnoticed until your users complain, of course.

00:56:50.420 --> 00:56:56.260
As Talk Python To Me listeners, track a ridiculous number of errors for free at rollbar.com slash

00:56:56.260 --> 00:56:57.160
Talk Python To Me.

00:56:58.100 --> 00:56:59.260
Want to level up your Python?

00:56:59.260 --> 00:57:04.460
If you're just getting started, try my Python jumpstart by building 10 apps or our brand new

00:57:04.460 --> 00:57:06.320
100 days of code in Python.

00:57:06.320 --> 00:57:10.120
And if you're interested in more than one course, be sure to check out the Everything Bundle.

00:57:10.120 --> 00:57:12.360
It's like a subscription that never expires.

00:57:12.900 --> 00:57:14.560
Be sure to subscribe to the show.

00:57:14.560 --> 00:57:16.760
Open your favorite podcatcher and search for Python.

00:57:16.760 --> 00:57:18.000
We should be right at the top.

00:57:18.000 --> 00:57:24.120
You can also find the iTunes feed at /itunes, Google Play feed at /play, and direct

00:57:24.120 --> 00:57:27.320
RSS feed at /rss on talkpython.fm.

00:57:27.700 --> 00:57:29.200
This is your host, Michael Kennedy.

00:57:29.200 --> 00:57:30.540
Thanks so much for listening.

00:57:30.540 --> 00:57:31.620
I really appreciate it.

00:57:31.620 --> 00:57:33.560
Now get out there and write some Python code.

00:57:33.560 --> 00:57:53.560
I'll see you next time.

00:57:53.560 --> 00:58:23.540
Thank you.

