Python in Brain Research at the Allen Institute

Episode #164, published Fri, Jun 1, 2018, recorded Fri, May 4, 2018

Episode Deep Dive Links Transcript

The brain is truly one of the final frontiers of human exploration. Understanding how brains work has vast consequences for human health and computation. Imagine how computers might change if we actually understood how thinking and even consciousness worked.

On this episode, you'll meet Justin Kiggins and Corinne Teeter who are research scientists using Python for their daily work at the Allen Institute for Brain Science. They are joined by Nicholas Cain who is a software developer supporting scientists there using Python as well.

Even if you aren't interested in brain science directly, I encourage you to listen to the entire interview. It's really fascinating.

Episode Deep Dive

Guests Introduction and Background

Justin Kiggins:

Has a background in bioengineering and neuroscience.
Began with MATLAB and LabVIEW, then rewrote C-based experiments in Python, embracing Python for data analysis and experimental control.
Works on the visual behavior team at the Allen Institute, studying how neurons in mice respond to visual tasks and how to decode that neural activity using Python-based machine learning tools.

Corinne Teeter:

Studied physics and psychology, later switching to Fortran and MATLAB for computational neuroscience.
Moved to the Allen Institute, focusing on electrophysiology and modeling individual neurons to characterize their electrical properties.
Contributed to large-scale standardization efforts and data pipelines, transitioning from MATLAB to Python for open-source and reproducible research.

Nicholas (Nick) Cain:

Completed a PhD in applied math, originally writing code in Fortran and MATLAB.
Shifted to Python during graduate school for flexibility and efficiency in computational neuroscience.
At the Allen Institute, supports scientists in building large-scale computational models (e.g., Brain Modeling Toolkit) and data pipelines with Python (including distributed computing with HPC resources).

What to Know If You’re New to Python

Here are a few pointers for getting the most from this episode’s discussions on data pipelines, open science, and neuroscience research in Python:

Python’s data science ecosystem (e.g., NumPy, pandas, scikit-learn) is central to analyzing large volumes of brain imaging and electrophysiology data.
You’ll hear references to pip installing packages (like Allen SDK) to work with Institute data. Understanding how to install and import libraries is essential.
Basic familiarity with JupyterLab or a Python-capable editor (Atom, VS Code, PyCharm, etc.) will help you follow along with the real-time data analysis and visualization steps.

Key Points and Takeaways

Central Role of Python at the Allen Institute Python has largely replaced MATLAB and Fortran for both daily workflows and specialized neuroscience experiments. Its readability, open-source ecosystem, and scientific libraries (NumPy, pandas, scikit-learn) make it ideal for standardizing experiments and sharing reproducible results. This shift helps scientists efficiently process and visualize vast amounts of brain data, tapping into a powerful community that continually expands Python’s capabilities.
- Tools / Links
  
  :
Industrial-Scale Neuroscience Pipeline The Allen Institute has developed a highly automated pipeline to train mice on visual tasks, record their neural activity, and generate large-scale datasets. Specialized microscopes and standardized training protocols allow them to collect terabytes of data, which then undergo automated quality control and preprocessing. This industrial approach ensures consistency and wide application, enabling robust, large-scale scientific insights.
- Tools / Links:
  - Allen Brain Observatory
Behavioral Experiments with Mice Researchers train mice to perform simple tasks (e.g., licking a spout for water when they see certain images). Once trained, mice are recorded under a microscope that captures individual neuron activity. This massive dataset is then fed into machine learning models (often via scikit-learn) to decode how neurons represent visual stimuli and guide behavior.
- Tools / Links:
  - JupyterLab
  - Atom (Justin’s preferred editor)
  - Eclipse PyDev (Corinne’s setup)
Open Science Model The Allen Institute embraces immediate data releases, sometimes even before publishing their own papers. This approach is possible due to private funding and a mission that prioritizes collaboration. It contrasts with traditional academia, where researchers often hold data until after publication. The Institute’s model accelerates discoveries by providing broad access to validated, large-scale neuroscience datasets.
- Tools / Links
  
  :
  - Allen Institute GitHub
  - AllenSDK on PyPI
AllenSDK and Data Accessibility The AllenSDK is a Python package that streamlines interaction with Allen Institute data. Researchers can “pip install allensdk” and quickly access curated, preprocessed data, including time-series neural recordings. This reduces duplication of effort and helps scientists worldwide build on standardized datasets without re-implementing the Institute’s heavy-duty preprocessing steps.
- Tools / Links
  
  :
  - AllenSDK GitHub Repo
Machine Learning and Neural Decoding A core challenge is identifying how patterns of neuronal firing correspond to external stimuli or behaviors. The team uses Python’s machine learning libraries, particularly scikit-learn, for classification and regression. By mapping thousands of cells’ activity to observed mouse actions, they gain insights into how information flows across different brain regions.
- Tools / Links:
  - scikit-learn
Brain Modeling Toolkit (BMTK) In addition to decoding neural signals, the Institute develops the Brain Modeling Toolkit to simulate neural circuits at varying levels of detail. Python’s flexible syntax allows them to wrap multiple underlying simulators (Neuron, Nest, or specialized PDE solvers) with a unified API. This approach lowers the barrier for researchers aiming to run large-scale or biologically detailed simulations.
- Tools / Links:
  - Neuron Simulator
- Nest Simulator
Increasing Use of Best Dev Practices in Science The guests highlighted how adopting software engineering tools (e.g., version control, continuous integration, testing) has transformed scientific computing. GitHub-based workflows ensure reproducibility, while Docker and cloud services like AWS let them share huge datasets in a manageable way. These practices cultivate a more robust, efficient, and open research ecosystem.
- Tools / Links
  
  :
  - Docker
  - AWS Snowball
Data Volume and Computational Challenges Collecting terabytes of microscopy data each day is not unusual. The Institute’s HPC clusters perform complex image processing (e.g., segmentation, neuropil subtraction) before data is shared. Even post-processed data can easily reach gigabytes or terabytes. Researchers must balance local analysis (downloading curated data) vs. remote computation (sending their algorithms to the data).
- Links:
  - Allen Institute Data Portal
Bridging Biological Complexity and Pythonic Simplicity The brain has layers of complexity, from single-ion channels to entire networks. Python’s ecosystem, rich with libraries and frameworks, lets researchers choose the right abstraction level for their studies. That might be single-compartment neuronal models, population density models, or high-level classification with scikit-learn.

Tools / Links:
- Bokeh (for interactive data dashboards)
Matplotlib (classic Python plotting)

Interesting Quotes and Stories

“If I'm just looking at the activity, can I predict what the mouse's choices were at any given time?” , Justin on bridging raw neural signals with scikit-learn-based classification

“We want everyone to be able to look at our code... and we’re going to go with Python.” , Corinne on why the Allen Institute standardized on Python

“We started releasing data in 2016. There are already 12 preprints analyzing it, and we haven’t written our own paper yet.” , Justin on truly open science at the Institute

Key Definitions and Terms

AllenSDK: A Python library providing programmatic access to the Allen Institute’s neuroscience data, including preprocessed neuronal recordings and analysis tools.
Neuropixels: A type of electrode probe capable of recording from thousands of neurons simultaneously, offering high-density sampling of brain activity.
Neural Coding: The concept that neurons encode information about external stimuli (or internal states) through their patterns of activity, like “representations” in the brain.
Industrial-Scale Pipeline: Refers to a standardized, large-scale approach to collecting and processing data, similar to a factory production line, ensuring consistency and scalability.

Learning Resources

Here are a few courses from Talk Python Training if you want to strengthen your Python foundations or data-science toolkit.

Python for Absolute Beginners: If you’re just starting your coding journey, this course covers core Python concepts with hands-on exercises.
Data Science Jumpstart with 10 Projects: Expand your data science skill set with real projects in Python, from basic analysis to more complex workflows.

Overall Takeaway

The Allen Institute is pioneering a new era of open science in neuroscience by combining massive data pipelines, cutting-edge experiments, and Python’s extensive open-source ecosystem. From training mice in visual tasks to simulating entire neural circuits, Python has proven indispensable. By releasing data and tools like AllenSDK well before publication, they invite the global community to join in discovery. Their experiences show how software engineering best practices and scientific collaboration can align powerfully to deepen our understanding of the brain, and drive innovation across research fields.

Links from the show

Twitter: @alleninstitute
Brain Observatory: observatory.brain-map.org
Allen Brain Observatory overview video: youtube.com
Cell Types: celltypes.brain-map.org
First open database of live human brain cells video: youtube.com

Research packages
allensdk: alleninstitute.github.io/AllenSDK
bmtk: github.com/AllenInstitute/bmtk
neuroglia: neuroglia.readthedocs.io

Allen Institute's Github page: github.com/AllenInstitute

Jobs at The Allen Institute: alleninstitute.org/what-we-do/brain-science/careers/job-search
Episode #164 deep-dive: talkpython.fm/164
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #164 deep-dive: talkpython.fm/164

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 The brain is truly one of the final frontiers of human exploration.

00:03 Understanding how the brain works has vast consequences for human health and for computation.

00:08 Imagine how computers might change if we actually understood thinking and even consciousness.

00:13 On this episode, you'll meet Justin Kiggins and Corinne Titor, who are research scientists using Python for their daily work at the Paul Allen Brain Institute.

00:21 They're joined by Nicholas Kane, who's a software developer there, supporting scientists using Python as well.

00:28 Now, even if you aren't interested in brain science directly, I really encourage you to listen to this entire interview.

00:34 It's super fascinating.

00:36 This is Talk Python To Me, episode 164, recorded May 4th, 2018.

00:54 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

01:01 This is your host, Michael Kennedy.

01:04 Follow me on Twitter, where I'm @mkennedy.

01:06 Keep up with the show and listen to past episodes at talkpython.fm.

01:09 And follow the show on Twitter via at talkpython.

01:12 This episode is brought to you by Cox Automotive and Rollbar.

01:17 That's right.

01:17 Cox Automotive has joined the show as a sponsor.

01:20 They're looking for new developers.

01:21 So check out what they're offering during their segment or the link in the show notes.

01:27 Justin, Corinne, Nick, welcome to the show.

01:29 Yeah, thank you for having us.

01:30 Yeah, hello, hello.

01:31 It's super exciting to have you here on the podcast.

01:34 I'm very, very interested in learning about how you're applying Python and data science type things to brain science.

01:42 It's going to be really, really fun.

01:43 Great.

01:44 Yeah, for sure.

01:45 But before we get into all the details, let's start with your story.

01:48 I guess, Justin, go first.

01:49 How did you get into programming in Python?

01:51 Yeah, so I think I started programming mostly in kind of college, working in research labs, you know, part of engineering classes.

01:58 And that was largely kind of MATLAB and LabVIEW.

02:02 MATLAB is kind of the dominant language in most neuroscience research environments.

02:08 And what was your degree?

02:09 What were you studying at the time?

02:10 I was studying bioengineering, biomedical engineering.

02:13 And then I went and started a PhD in neuroscience.

02:16 And it was during my PhD that I decided that there was this old C code, like raw C, that my advisor had written for some of our experiments.

02:27 And I was chasing pointers and trying to figure out how to do memory buffers with audio.

02:35 And I was like, this is brutal.

02:36 I don't want to do this.

02:37 And I basically cold turkey switched everything that I was doing over to Python.

02:42 So rewrote a bunch of that code, taught myself Python by kind of rewriting that, implementing it in Python, starting to use some of the scientific Python stuff for my analysis, building out a Django database to maintain my, to keep track of my research that I was working on.

02:58 It was kind of a cold turkey switch for me about 2012.

03:01 While I was working on my PhD.

03:04 And it was a good switch.

03:05 You're happy with it?

03:05 Yeah.

03:06 Yeah.

03:06 I mean, I think that it's done me well.

03:08 And the rest of the field, I think, is starting to catch up.

03:12 And it's only become more powerful since then.

03:15 Yeah.

03:15 If you look at the popularity of Python, it's been going upward.

03:19 But there was a major inflection point where it became the rate of popularity growth increased around 2012.

03:25 And I think there's just a lot of that is due to the data science tool improvements, that whole space.

03:31 Yeah.

03:32 Absolutely.

03:32 And I think that I really just caught the edge of that wave.

03:36 Yeah.

03:37 You're part of that wave, for sure.

03:38 For sure.

03:39 Nice.

03:40 And Corinne, how about yourself?

03:41 Yeah.

03:41 So I started coding after my undergraduate degree.

03:45 I had an undergraduate degree in physics and psychology.

03:49 And afterwards, went to Los Alamos National Lab.

03:53 And so my first coding language there, I was doing very physics-y, dominated stuff.

03:59 So it was Fortran, actually.

04:02 And then after that, I went back to grad school in computational neuroscience.

04:06 And there, the main coding language we used was MATLAB, as Justin mentioned.

04:11 And then after that, I had a couple positions at, for example, Qualcomm and Sandia National Labs.

04:18 And there, I was still using mostly MATLAB.

04:22 So we'd have to buy licenses.

04:24 And then I came to the Allen Institute.

04:27 And here, Nick and I were both here during the very beginning of kind of our latest 10-year plan.

04:33 And we wanted to make sure that everything we use, like one of the goals of the Allen Institute is to be able to make standardized data that the community can use.

04:41 And part of that is wanting it to be open source.

04:45 So a lot of us on the ground were thinking about this.

04:48 And we had the option at the time to use whatever coding language we wanted to for the projects that we were pursuing.

04:54 But we really all got together and was like, you know what?

04:56 We want everyone to be able to look at our code.

04:58 We want each other to look at our code.

05:00 And we're going to go with Python.

05:02 So I learned Python on the fly when I came to the Allen.

05:04 What was the transition like coming from, say, MATLAB to Python?

05:08 It was a learning curve.

05:09 I'd like to say I was kind of floundering around for probably about three months.

05:13 And there was a lot of, like, indexing in MATLAB.

05:17 And Python is different.

05:18 And we do a lot of time series analysis data.

05:22 So just the indexing and things like that was a transition.

05:26 But, you know, at the end of the day, I'm very glad that we chose to do that.

05:30 That's cool.

05:31 Yeah, it's different, but it's not that different, right?

05:34 It does still have a similar feel, at least, I think.

05:38 So we could have gone to, like, different.

05:40 I mean, we could have gone to C or some other language like that.

05:43 But really, it was a great transition for people on the outside.

05:46 We knew a lot of them would be very MATLAB savvy since that was kind of the main code at the time.

05:52 I think people are transitioning now, but it's still a high-level code.

05:56 Right.

05:57 And I think the ecosystem for Python aligns very well with your mission of trying to have everything open source and stuff, right?

06:04 It's of all the different languages, Python embraces the sort of zen of open source more than average, I would say.

06:11 Nick, how about yourself?

06:13 Yeah, so just like Corinne and Justin, I started programming in MATLAB as an undergraduate.

06:18 When I went to graduate school at the University of Washington for my Ph.D., I was in an applied math department, and my advisor encouraged me to learn Fortran.

06:30 So I wrote my first project in Fortran, and just like Justin was saying, I was chasing all sorts of things that I didn't really understand and having a difficult time,

06:38 and decided I would try out this language that some of my colleagues were telling me about,

06:44 and started rewriting all my algorithms in Python and using that as sort of my learning case.

06:50 And then as I got more into computational neuroscientists, or as I got into computational neuroscience,

06:56 there's actually a lot of packages that are written in low-level languages, packages like Nest or Neuron,

07:03 that have developed really good Python bindings.

07:06 So I realized that I didn't have to sacrifice efficiency or engagement with these other theoretical and computational neuroscience communities,

07:15 but I could still program in a language with a ton of flexibility and a ton of tools.

07:19 So it was a really natural transition over.

07:22 So then when I came to the Allen Institute, brought that knowledge in, and to be honest, really haven't looked back.

07:28 I've been using Python for most of my day-to-day work.

07:30 That's really a cool story.

07:32 And because of the bindings, right, because there's underlying libraries,

07:36 people can still use those libraries.

07:39 It's just you happen to be able to program in a higher-level language, and if they want to go write in C or Fortran, that's all well and good for them, right?

07:46 Yeah, and you can use the expertise of really core developers working on highly technical material

07:52 in really efficient multiprocessing libraries, but then be able to define, at a high level, define simulations and define models in a much user-friendly syntax,

08:04 but really not sacrifice efficiency.

08:08 Yeah, that's really, really cool.

08:09 So I kind of want to go through the projects that you're each working on

08:14 and give people a sense of what is it you do day-to-day at the Allen Institute

08:20 because it's not like, well, I work at this e-commerce site, or I work at a bank,

08:24 and we all know what that looks like.

08:26 But you work at a pretty special place, so I'm going to keep the same order, I guess.

08:30 Justin, what kind of stuff do you do day-to-day?

08:32 I'm a scientist in the visual behavior team.

08:35 So in general, I'm in a chunk of the institute that is very interested in neural coding.

08:44 So we can think a little bit about sometimes, one way of thinking about what the brain does and what neurons in the brain do

08:51 is that they have some representation, some way of encoding what is out in the environment.

08:56 So if I'm looking at something, there's a particular pattern of activity that that's going to elicit in the cells in my brain.

09:05 And so in general, this is understanding how this happens and how these types of representations emerge, what they are,

09:15 and then how other parts of the brain use those representations to make decisions,

09:20 to do whatever the other parts of the brain need to do with those kind of intermediates.

09:24 In general, that's the kind of stuff that we do.

09:28 So we have a large experimental pipeline.

09:32 One of the interesting things about the Allen Institute is we kind of take an industrial

09:35 approach to generating data for these types of experiments.

09:39 We have these very large pipelines that generate very large data sets on standard experimental rigs.

09:45 So are these experiments, are this like you're bringing folks in and you have them,

09:50 do you hook them up with an EEG?

09:51 Yeah, no.

09:52 So most of the work that we do here in my group is dealing with mice.

09:57 So we can actually present images on the screen for the mouse and then record individual neurons in the mouse's brain.

10:06 And it's very hard to do this in humans, but it's a little bit easier to do it in mice.

10:10 That's wild.

10:11 How do you get them to pay attention to the screen?

10:13 This is part of what my project deals with.

10:18 So there's an experimental setup.

10:20 So at the end of the day, we basically need to fit a very large microscope over them in order to record from the individual neurons.

10:28 And a very small glass window is implanted in order to be able to see into their brain.

10:34 They are basically trained to be comfortable with getting their head attached up against the microscope.

10:42 And they've got a little running wheel they can run on.

10:45 And then we have the screen next to them that's presenting images.

10:47 I see.

10:48 So they're kind of fixed, like looking straight at it.

10:50 They're kind of stuck, but they've got a wheel in front of them.

10:54 So we are controlling the visual environment, but they're kind of free to move otherwise.

10:58 So it's almost like a little virtual reality type.

11:01 Yeah, yeah, yeah.

11:02 We just need a miniature Oculus Rift type thing you can put on them.

11:07 Yeah, exactly.

11:07 I mean, that's basically...

11:08 So then, I mean, that's an interesting segue because then your question, you know, how do we actually get them to pay attention,

11:14 is that we put a lick spout in front of them.

11:17 And they can lick the lick spout.

11:21 And if they lick at the right times, then we make sure that they get a little bit of water.

11:25 And so they basically, through trial and error, start realizing what we're trying to get them to pay attention to on the screen.

11:33 They're basically in a little video game.

11:35 I mean, they basically are, you know, we're controlling what's on the screen,

11:40 and they have to lick when the game rewards them for licking.

11:43 Yeah, that's really wild.

11:45 That's quite interesting.

11:46 So then you capture all this data and sort of analyze it afterwards, huh?

11:51 Yeah, yeah.

11:51 So we generate data.

11:52 I mean, it's a little bit of...

11:54 So we've got some...

11:55 So some of the data gets streamed and analyzed in real time to give the trainers feedback on what the mice are doing and their well-being.

12:03 We have a...

12:05 In order to train them and to do this at scale, we have to standardize these training procedures.

12:11 The game has to go from easy to hard.

12:13 So we have an entire system that Nick and I actually have coordinated on where at the end of each training session, the data gets uploaded.

12:23 Some automatic analysis happens that determines what the next stage is that the mouse is going to have the next time they come in for training.

12:32 And so that requires pushing data back and forth between servers, sending it off to a microservice that Nick is running, and then the next day the mouse is on that next stage.

12:43 So we train them up.

12:45 Then when they're ready, then we put them under the microscope.

12:49 And so they're in a similar situation.

12:52 But now we've got a microscope that is recording the activity of individual neurons in their brains.

12:58 This gets acquired over a few days.

13:00 All that data...

13:02 I mean, we're talking very large data files that are literally movies of neurons in their brain.

13:08 That all gets pushed up.

13:09 That was going to be one of my questions.

13:11 Like, how big are one of these files?

13:12 Like, how much data are we talking about?

13:14 Yeah.

13:14 How big is one of these files, Nick?

13:15 Do you even know?

13:16 Terra scale.

13:17 I think it's less than a terabyte, but it's, you know, many, many hundreds of gigs.

13:21 Wow.

13:22 Okay.

13:23 So this gets pushed up to the server, and then we've got a whole other team that's developed algorithms for basically extracting these signals out.

13:31 So you've got a bunch of kind of ML that has to happen in order to basically do segmentation, a bunch of image recognition stuff.

13:39 You know, where are the cells in this movie?

13:41 And then extracting the activity of those cells.

13:44 And then basically kind of at the end of a bunch of that pipeline, of that kind of processing ML pipeline, a bunch of this data then comes back to me,

13:52 where now I have signals and, you know, I know I have the record of the images that were presented on the screen.

13:59 I've got other data about when the mouse licked, when it didn't.

14:02 And so basically then I take this data and try to make sense of it.

14:07 So to what extent can I, you know, if I'm just looking at the activity, can I decode what was on the screen from that activity?

14:16 If I'm looking at the activity, can I predict what the mouse's choices were at any given time, whether it chose to lick or whether it chose not to lick in the context of its performance on the game?

14:29 That's just fascinating.

14:31 Yeah, this is really wild.

14:32 I had no idea that you could do these kinds of things.

14:35 Yeah, and at the end of the day, I mean, basically to do this, it's all largely like scikit-learn and pandas, right?

14:42 Like reducing this stuff into a feature matrix where your feature vectors or the identity is the activity of any given cell.

14:52 So if I've got 100 cells that we recorded from, each cell becomes one dimension in my vector,

15:00 and I've got a bunch of categorical or continuous information about what was on the screen,

15:05 and then it's just a regression or a classification problem at that point.

15:10 Yeah.

15:10 And this basically is what lets us kind of, you know, by approaching it in this way,

15:14 we can build the inferences and say, well, you know, this area over here did really good at decoding images.

15:20 This area over here didn't.

15:22 But that area was very good at predicting what the mouse's decision was.

15:27 So we can kind of start to build out inferences about kind of what different parts of the brain are doing

15:32 and how they're doing that through this type of approach.

15:34 This portion of Talk Python To Me is brought to you by Cox Automotive.

15:40 They're leading the way in cutting-edge, industry-changing technology that is transforming the way the world buys, sells, and owns cars.

15:47 And they're looking for software engineers and technical leaders to help them do just that.

15:52 Do you hate being stuck in one tech stack?

15:54 Well, that's not a problem at Cox Automotive.

15:56 Their developers work across multiple tech stacks and platforms.

15:59 They give you the room you need to grow your career.

16:01 Bring your technical skills and coding know-how to Cox Automotive.

16:05 You'll create real-world solutions to today's business problems alongside some of the best and brightest minds.

16:11 Are you ready to challenge today and transform tomorrow with Cox Automotive?

16:15 Go to talkpython.fm/cox, C-O-X, and check out all the exciting positions they have open right now.

16:25 A couple of thoughts.

16:26 One, who would have thought that a library coming out of the financial industry, pandas,

16:33 would be helping us understand the brain?

16:35 And also, who knew mice could generate so much data?

16:37 Yeah.

16:39 Well, I mean, the amount of data that we can generate, I mean, I think it's probably obvious.

16:43 And we're not even getting all of the data that we could generate out of these guys.

16:47 We have, this is, I mean, and this data that we're talking about, I mean, we're literally talking,

16:53 zooming in on an area of the mouse's brain that is maybe a few, what, like hundreds of microns, micrometers wide,

17:04 and maybe like, and a really thin, thin piece.

17:07 So we're talking, I mean, we're talking about a couple, you know, many dozens to hundreds of neurons

17:12 out of the thousands and thousands of neurons in the mouse's brain, right?

17:17 We're not, I mean, this is just the tip of the iceberg in what we could be potentially recording as technologies improve.

17:23 Yeah.

17:23 And someday we probably will be, right?

17:25 Yeah.

17:25 I mean, there's tons of initiatives.

17:27 I mean, the Allen Institute is leading on a bunch of efforts.

17:30 The recording modality I just described to you is that what we have currently released

17:35 and the kind of stuff that is currently on our website, not with all the behavior, but that recording modality.

17:43 And there's a new effort of Neuropixels probes that the Allen Institute has been involved in that will get us kind of up into the thousands range of simultaneously recorded cells.

17:56 And there are even more forward-thinking efforts to be much more comprehensive in what we can record from with this level of detail.

18:05 Yeah.

18:05 It's amazing.

18:06 All right.

18:06 Corinne, how about yourself?

18:08 Yeah.

18:08 So Justin works on kind of a higher-level project where, you know, you have actual behaving mice.

18:14 I'd say I work at one scale downwards.

18:17 So a large part of this institute is trying to really define what the components are in the brain.

18:26 So you have a bunch of neurons, and theoretically those are differentiated into different cell types.

18:33 So researchers in the field have been trying to figure out what sorts of types of neurons there are in the brain for probably 100 years.

18:41 And this is something that people haven't really solidified their ideas on.

18:46 Is it still an open question?

18:47 Like people don't know all the types?

18:49 Yeah.

18:50 It's an open question.

18:51 So we're really devoting a lot of resources to try to get to some sort of ground truth.

18:59 It's not clear that there's going to be specific types of neurons.

19:02 There's probably a continuum, but how well can we define it?

19:05 And then after we've had some definitions, can we figure out what those different types are doing,

19:11 what function they're performing in the brain?

19:13 So Justin didn't mention that.

19:17 There's the reason we use mice.

19:19 And the reason we use mice is that we have a lot of genetic controls.

19:22 So we specifically breed different types of mice to fluoresce looking under a microscope for different types of genes that are expressed in the neurons.

19:34 The mice don't fluoresce, but the individual neurons.

19:37 The neurons fluoresce.

19:38 So when you're looking under a microscope, you see a bunch of different neurons.

19:42 And depending on what type of neuron we're marking, that neuron will fluoresce under a microscope.

19:48 So we have a lot of genetic control over recording from neurons that we kind of know what kind of transgenic type they are.

19:57 In my group or the group that I work on and the project I work on, we are looking at electrophysiology data.

20:04 So what that means is you stick an electrode into a neuron.

20:08 So these mice are sacrificed and you have slices of the brain tissue.

20:14 And we also do this in human.

20:16 We have a lot of agreements with the hospitals within the area where if they're excising part of the brain during a surgery,

20:23 we will get that tissue and we'll record from those neurons also.

20:26 So this is a nice project to kind of try to relate mice to humans.

20:31 How similar are they?

20:32 My first position, as I mentioned, I came from a physics background, was basically building a modeling pipeline.

20:39 So you stick an electrode into a neuron, you inject current, and you record the voltage output.

20:46 And then you try to come up with mathematical equations that will recreate the behavior of a neuron based on current injection, just like a circuit.

20:55 And so we recently wrapped up this project where we were looking at, you know, how much specific mathematical equations are needed to reproduce the behavior of these neurons.

21:10 And so this is all available on our website now.

21:13 And the idea here is that when people are building larger scale networks, you want to use realistic spiking behavior of individual neurons.

21:22 So now depending on the level of abstraction someone might want to use in a network that they're building, they can choose from this different range of abstraction that we have on our website.

21:33 So first project was that, working and building a whole pipeline to do this all automated.

21:38 So data is taken.

21:39 It goes into our storage facility.

21:42 I take the data.

21:44 Well, then there's some QC algorithms that we built up to, you know, QC the data.

21:49 And then I pull that data out, come up with algorithms, test them in a very machine learning type way.

21:55 You know, you basically have a test set and a training set.

21:59 And then the project I work on now is trying to figure out those components.

22:03 So you inject current into one neuron and you measure the voltage that's happening on another neuron it's connected to.

22:10 How complicated does that get?

22:12 Is it kind of simple to some degree, like Newtonian mechanics?

22:16 Is it crazy, like complex, dynamical, chaotic systems?

22:20 Like, what are you working with here?

22:23 So it depends on, so Nick will talk about this a little bit because he spent a lot of time building actual network models.

22:28 But so what I do is, I would say it's relatively simple mathematical equations.

22:35 There is also a level of models that are made that are what we call biologically realistic, where you try to model all of the ion channels in a neuron.

22:46 So you have lots of different ion channels in a neuron.

22:48 Calcium, sodium, potassium, lots of them.

22:53 25 different, you know, many.

22:55 25 to 100 different channels.

22:57 And you actually try to model, like, the gates opening and closing and current flowing into the neuron.

23:05 But we abstract away from that because we have found that that's not necessarily necessary to predict the spiking behavior of a neuron.

23:13 But we also have those high level, or sorry, those very complex models, too.

23:18 So it depends.

23:19 Okay.

23:19 Yeah.

23:20 So you can look at it at different levels depending on how, while you're trying to ask questions.

23:23 And, yeah, Nick, how about COVID?

23:25 Well, I just wanted to jump in there and sort of highlight one of our Python packages that the Institute has been building over the past year.

23:32 It's a package we call the BMTK, the Brain Modeling Toolkit.

23:37 It's available on our website, and it's gone through sort of a soft release.

23:41 It's a Python package, a Python wrapper around several neural simulators, like I was mentioning earlier, that allow researchers to construct and simulate neural circuits, like Corinne was saying, at a bunch of these different levels of biological realism.

23:58 So, you know, Corinne was highlighting some of her work at the sort of one differential equation per neuron type scale.

24:04 But you can go much deeper and simulate individual compartment models that can resolve the complicated morphologies of the dendritic trees and how those interact with each other.

24:17 Or even all the way down to the synapse with these stochastic models of ion channels, that's at the sort of extreme end of biological realism.

24:24 My first project at the Institute was actually on the other extreme, the so-called population density modeling, where we use partial differential equations to simulate entire populations as sort of one homogenous group.

24:38 And there's different biological questions that you'd want to pose at different sort of points on this continuum.

24:45 I'll give you an example.

24:46 Is it important to simulate the exquisite, complicated nature of the trees of these neurons to understand their input-output properties?

24:55 Well, if the answer is yes, then you're going to have to use a simulation tool like Neuron.

24:59 Although it might be sufficient to just look at the spiking behavior down near the soma of the cell, in which case a simulation tool like Nest, which also has a Python API, would be more appropriate.

25:11 If you just are interested in the sort of mean field dynamics, the population-to-population contributions to circuit dynamics, then the neural simulator that I wrote called DIP-D, which is actually in pure Python, would be the tool that would be the most appropriate.

25:27 So we have a Python package that actually wraps all of these different levels of detail, so you can move sort of in between each of the different scales as your work is sort of demanding.

25:37 Yeah, that sounds really interesting, because you might start a research project thinking, I'm going to look at one level, but realize, no, actually, we need to try to think about it differently.

25:46 But you have the same API or something like that, right?

25:48 Exactly.

25:49 And there's a big switching cost associated with having to learn a whole set of tool chains.

25:56 It was originally written that the simulator Nest had its own custom language for describing the network topologies.

26:02 I think it was called SLI.

26:03 I know Neuron has its own language called a Hawk file, right, Corinne?

26:07 Yes.

26:07 Yeah.

26:08 But it also has a Python interpreter now.

26:11 So if you're having to switch based on your biological question to a different type of simulation, and now you've got to learn yet another custom description language or modeling language, it really is taxing on the individual scientist.

26:25 So that's why unified Python APIs that you can just sort of learn one language but still get the power of all of these simulation tools is really helpful.

26:33 Yeah, that sounds great.

26:34 So are these tools and libraries being taught in academia these days?

26:39 That's a great question.

26:41 Are they used for research projects?

26:41 So there are some examples where they're taught to undergraduates, although it's a pretty specific topic.

26:48 And in graduate school, that's where I learned about all of these tools.

26:52 But that's when you're doing your PhD and you have a neuroscience or an applied math question in mind, and then you go to find the tool that's most appropriate.

27:01 So most of the time, I'd say, it's learning on your own.

27:05 We do have several examples of training courses that we provide.

27:09 They're actually not just at the Allen Institute but all over the world for sort of specialized computational neuroscience and also experimental neuroscience.

27:17 And I know that at our summer training course last year, using these Python APIs was one of the main focuses of the course.

27:28 Or it was a focus of the course.

27:31 Are these courses taught online?

27:32 Are they taught in Seattle?

27:33 That's where you guys are?

27:34 Yeah.

27:35 So we're all in Seattle, just down here on Lake Union.

27:38 But I was referring to our Friday Harbor summer course, which is actually, I think, in its fifth year now, where there's an application process really geared towards graduate students and postdocs, maybe early faculty.

27:52 And it's a two-week in residence up in San Juan Islands at the Friday Harbor Labs, which is run by UW.

27:57 That's the Friday Harbor tie-in.

27:59 Yeah.

27:59 I've stayed up there on San Juan Island.

28:01 Oh, yeah.

28:01 It's wonderful up there.

28:02 Yeah.

28:02 It's a combined effort between the University of Washington and Allen Institute.

28:06 There's also funding from the Cavalier Institute that helps make sure that it is a success.

28:11 And it's not only just computational neuroscience, but actually, I think it's kind of morphed into a combination of big data and experimental neuroscience.

28:22 And there's also an introductory period where it's all taught in Python.

28:25 So if the students are coming into the course without a strong Python background, there's a sort of a Python boot camp for the first couple days to help train students to use our data APIs and some of the tools that they might find useful on their projects.

28:39 All of the course material from this from the last few years is on the Allen Institute GitHub repository.

28:45 So there's Jupyter Notebooks.

28:48 You know, one thing we also haven't – yeah, so there's Jupyter Notebooks that cover a lot of the course material that those students use.

28:55 And that's freely available for folks to go and download and start poking around.

29:01 This also – one thing that this also offers is, you know, I mentioned that we release a bunch of this data online.

29:08 So currently we have what we call our brain observatory, which is observatory.brain-map.org.

29:18 That is the website for the web version of it.

29:21 And you can go there and you go to the website and you can kind of poke around and see a little bit of what is in this data set of, I want to say, 40,000 or so neurons from my work.

29:32 Corinne's data is somewhere in a parallel.

29:36 It's also on brain-map.org.

29:38 I don't remember exactly where it is.

29:40 But the stuff that she's been working on is also freely released.

29:44 And Nick's team manages the API and Python wrapper for the API to access this.

29:52 So you can basically do a pip install Allen SDK.

29:56 That will install a Python package locally.

29:59 You spin that up and in like three or four lines of Python code, you'll start downloading almost all the data that we've released in this.

30:08 It might take a while, but you've got access to it at your fingertips.

30:13 One of my thoughts around this is you talked about how much data that you're gathering for all of these projects and stuff.

30:19 And I know the folks at CERN, instead of running their, downloading the data and running analysis on it, they push their analysis to where the data is.

30:29 Because there's so much data, they've got like a cloud computing infrastructure that like send your algorithm to the data and run it locally.

30:35 So with yours, what is it like?

30:38 Do you actually download all the data and process it?

30:41 Or do you download all segments that you ask for?

30:43 Or how does it work?

30:44 Yeah.

30:44 So this specific tool is basically downloading after a lot of the pre-processing.

30:50 And we've gotten it to a point that it's condensed to the level that your average postdoc or graduate student who would want to explore this data would want to play with it.

31:01 That's about at the point that we release it for download.

31:05 There is, I mean, we have our own compute internally for a lot of our own data that relies on our cluster and where we keep everything very close to the compute.

31:15 It really depends on the types of questions.

31:18 We have an entirely different chunk of the institute that is not represented here that is taking, doing very dense microscopy.

31:26 So trying to build out, it's going to take them months to acquire the data set alone.

31:32 This is an electron microscopy.

31:34 So it's in every single neuron within some area in incredible detail.

31:39 And the size of that data is just, you know, it would be largely impossible not to do the analyses that need to be done on that without staying close to the data.

31:50 That's really wild.

31:51 What kind of questions are they trying to answer, Corinne, on that one?

31:53 You know?

31:54 Oh, I was just going to mention that.

31:56 I mean, we're trying to answer all sorts of questions.

31:58 We have been at Friday Harbor using Docker, too.

32:03 And AWS have donated time, you know, just like they did at CERN.

32:07 So last year at Friday Harbor.

32:09 So we used to just give out a terabyte disk to everybody that showed up with the data that we were going to be talking about.

32:15 It's the most efficient way to transmit the data sometimes.

32:18 Well, you're joking, but I mean, Amazon has their, like, ship us a bunch of disks is the fastest way to upload large quantities of data to, like, S3 and stuff.

32:27 It's wild.

32:27 Sometimes that's what you need.

32:28 So Amazon, yeah.

32:30 So last year, Amazon generously donated a bunch of space and credits for us for this course.

32:35 And so we, yeah, for all these students.

32:37 And we, yeah, we had a snowball here that they sent us.

32:41 And we put a bunch of data on it and sent it off to Amazon.

32:46 Yeah, so one thing that's interesting about that data, when you download data from the Allen SDK, the Python API,

32:52 what you're getting is a bunch of actually preprocessed data that has had a lot of computational algorithms already applied to it.

33:00 For example, neuropil subtraction, segmentation of regions of interest for the cells.

33:05 That basically gives time series for the activity of each cell.

33:08 All of that is there's an entire algorithms team at the Institute that works on the packages and the algorithms to do that.

33:16 When you access that data with the Python API, you're just getting that post-process data, not the raw imaging stacks.

33:24 Those are the sort of multi-hundred gigabyte or terascale data.

33:29 And that's what we need the snowball for.

33:30 Because we actually brought that to, made that data available actually through Amazon at Friday Harbor last year for students to sort of poke at that raw imaging data.

33:41 But there's really, really significant volume.

33:44 And also, if anybody requests it, they do send us a disk and we put the data on it and send it back to them.

33:50 That's another way that we handle our large data sets.

33:52 Oh, wow.

33:53 Okay.

33:53 Yeah, that's really interesting.

33:54 Because downloading a terabyte, like that's going to cause all kinds of problems.

33:58 I mean, even just paying for that much bandwidth, that's like $90 of bandwidth at AWS.

34:03 And it also begs the question what you're going to do with that data when you get it.

34:07 I mean, I'm not saying that a researcher wouldn't know what to do with it, but it takes a lot of time and a lot of effort to extract signal out of that data.

34:17 And that's sort of, I wouldn't call it a service we provide, but it's part of our institutional work to develop the algorithms to do that so that people don't have to retread that wheel constantly.

34:29 Right.

34:29 Just pay the computational cost of trying to compute.

34:32 That's got to be pretty high.

34:33 A computational cost, I also think it's the human cost.

34:36 It takes a very specialized set of skills to be able to computationally extract the meaningful data in those raw imaging stacks.

34:46 But we have a really world-class algorithms team that does a lot of that pre-processing for you so you can jump straight into the sort of the data that you might think of as the really relevant class of data.

34:59 What is the activity of the cell, not what was seen by the microscope?

35:03 Those are two different data dimensionalities.

35:05 Thanks, man.

35:06 This portion of Talk Python To Me has been brought to you by Rollbar.

35:12 One of the frustrating things about being a developer is dealing with errors.

35:15 Relying on users to report errors, digging through log files, trying to debug issues, or getting millions of alerts just flooding your inbox and ruining your day.

35:24 With Rollbar's full-stack error monitoring, you get the context, insight, and control you need to find and fix bugs faster.

35:31 Adding Rollbar to your Python app is as easy as pip install Rollbar.

35:35 You can start tracking production errors and deployments in eight minutes or less.

35:39 Are you considering self-hosting tools for security or compliance reasons?

35:43 Then you should really check out Rollbar's compliant SaaS option.

35:47 Get advanced security features and meet compliance without the hassle of self-hosting, including HIPAA, ISO 27001, Privacy Shield, and more.

35:56 They'd love to give you a demo.

35:58 Give Rollbar a try today.

36:00 Go to talkpython.fm/Rollbar and check them out.

36:03 How many people work there at the Allen Brain Institute?

36:08 I think we're pushing 400 or so totally in brain science, which is our kind of corner of the Allen Institute as a whole.

36:15 We're the largest chunk, and I think we're closer to 300, 250, or 300, somewhere in that fall.

36:20 Yeah, so a lot of expertise packaged in that area, right?

36:24 So, Corinne, you talked about people shipping you disks and sharing that data in some really interesting ways.

36:31 And I think that leads into one of the missions at the Institute, which I thought was really powerful.

36:37 It says you guys are committed to the open science model within your institutes.

36:42 Do you want to speak to that a little?

36:43 Absolutely.

36:44 Yeah, so in academia, things are generally done in smaller labs, and oftentimes you have a lot of difficulty reproducing individual experiments that happen there.

36:57 And I believe a lot of brain science is really hard.

37:02 It's really hard to figure out what's going on in the brain.

37:06 And I believe that when this project was started, it was like, what space is really not being covered by the neuroscience community, academia, and pharmaceuticals combined?

37:22 And that was being able to reproduce large sets of data, making them standardized, and making everybody be able to reproduce the results that you get so that you could kind of come to an agreement on ground truth and not be trying to reproduce other people's experiments.

37:41 And so we are one of the only institutions in that space that has done this.

37:48 And now other institutions are also trying to kind of follow suit because we've all recognized that we're really trying to solve this reproducibility problem and also deal with just the huge amounts of data in the brain.

38:02 Yeah.

38:02 How many different data centers have to be set up to do basically the same processing, right?

38:07 Like, if you guys could do that processing, share that data, and not have every university set up their own equivalent computing structure to do the same thing, that would be good, right?

38:18 Exactly.

38:18 And I know that the next step in a process now is that we have projects where you can apply to have your scientific study done in our platforms.

38:29 So we might be heading more towards that.

38:31 You know, we've set up this huge infrastructure.

38:33 People will apply to have what they think is interesting done here.

38:38 Nice.

38:38 It's a little bit of the computing close to the data type of thing.

38:41 Yeah.

38:42 Yeah.

38:42 You know, there's a lot of different dimensions to open science, right?

38:47 We've been talking a lot about open data, and we've talked a little about open software, which is open source software, which is another aspect nowadays of open science, as science has really kind of come to depend upon the software that implements the science.

39:02 And then there's also, of course, open access, right?

39:06 That's what we traditionally think of as the final work product of science is the paper that you publish.

39:13 And there's free preprint archives and free access on journals.

39:16 There's a lot of different – so my point is there's a lot of different dimensions that you can talk about open science.

39:22 But open source software is something that myself and the technology team really think about a lot.

39:27 You know, we've talked about the Allen SDK, and I also mentioned the Brain Modeling Toolkit.

39:31 I don't think I necessarily mentioned that both of these are open source packages, and we accept pull requests onto them, and we respond to GitHub issues.

39:39 And there's a large backlog because there's a lot of work to do, and we support a lot of different scientific projects at a very large scale.

39:47 But open source software development is really something that the Institute has really come to embrace.

39:53 I think the community has started to recognize just how critical it is to share our algorithms, share our processing code, share our analysis tools.

40:03 Yeah, I think that's such a great mission.

40:05 And I think partly you folks have a slight advantage over, say, Stanford, Rutgers, the other universities because you don't depend on the publish or peril model, right?

40:20 At least that's my understanding from the outside, right?

40:22 It's not like – No, that's exactly – Yeah.

40:24 Yeah, I mean, it is very important for us still to publish our data and to still communicate to the broader scientific public in that realm.

40:32 But yeah, our incentives are a little bit different.

40:35 I mean, I think a good example of this is our brain observatory.

40:40 We started releasing data in the brain observatory in May 2016, and we've had two or three more releases for this.

40:51 This is the data set that I was talking about.

40:53 It's 40,000-some-odd neurons.

40:56 We haven't published a paper on this yet.

40:59 So whereas, like, the rest of – even in the open science community and open data community, the debates right now are, okay, well, how soon after publishing the paper should we release the data?

41:14 Do we release it immediately on publication?

41:16 Do we wait six months or a year to give the primary author time to write a second publication before they move forward?

41:24 And these are important to base because that's the way that the incentive structures are in academia.

41:29 Nobel Prizes are handed out on this basis, things like that, right?

41:32 Exactly.

41:33 We've released this data, and there are already 12 preprints that external people have written and posted.

41:41 I think two of them are peer-reviewed and published also on the data already.

41:46 And we haven't even written our own paper analyzing this yet.

41:49 It's in the process.

41:50 We have not yet published our own paper dealing with this data.

41:54 So, yeah, so we have a very strong kind of data-first, publish-later model that you just can't do in the current academic infrastructure.

42:05 I mean, there are communities that have sort of demonstrated that this is possible.

42:09 The machine learning community is the one that I always think of as really jumping out there early with the latest developments of the algorithms and the approaches.

42:17 And they are starting to make it.

42:19 But I think it's a real big culture change for sort of more, I don't want to say entrenched, but, you know, the biological sciences have been around for a long time.

42:28 And the publication methods are the way they are for a reason.

42:32 But there's a cultural change that the three of us have only been out of our PhDs now for less than a decade.

42:39 And we are definitely sort of seeing this enormous change.

42:43 And it's fun to be at a place that really sort of embraces that cultural change.

42:46 Yeah, you guys seem to be at sort of the leading edge of that.

42:49 Yeah, and I'd like to say just quickly that we should acknowledge that this is all possible because of Paul Allen's generous donation to us.

43:00 I mean, he basically makes all of this possible.

43:02 And it's really the only place in the world that you can do this.

43:07 So kudos to him and his vision.

43:09 Yeah, absolutely.

43:10 He's got the Brain Institute.

43:14 And there's a couple of other ones as well, right?

43:17 Now you also have the Allen Institute for Cellular Science and the Allen Frontiers Group.

43:22 And the Allen Institute for AI.

43:25 Oh, really?

43:25 Okay.

43:26 Yeah, they're not hosted in our building.

43:28 They're in a separate building.

43:29 But yeah, there's an artificial intelligence group.

43:32 Yeah, so that's, yeah, absolutely.

43:34 It's great to acknowledge what he's doing because it sounds really unique and special.

43:39 And like we were just talking, it lets a lot of you work in a way that is sort of a better fit for the larger goal, not sort of career goals necessarily, right?

43:49 Yeah.

43:49 And just to follow back on that too is that we do want to make sure that we publish and we do have external advisory boards and we do apply for grants because we want to make sure that what we're doing in our space is relevant to the community.

44:02 We don't want to be in this one-off ball where we somehow discover something that's not relevant.

44:08 Or you know what I mean?

44:09 Like we want to make sure that what we're doing is valid.

44:12 Yeah, so the whole peer review process is still pretty interesting.

44:16 Yeah, still very valid for us.

44:18 Yeah.

44:18 Right, exactly.

44:20 So in 2013, President Obama came out with the BRAIN initiative, or BRAIN stands for actually an acronym as well.

44:29 Like how did that affect you guys?

44:31 Did it make any difference or the community?

44:33 Yeah, I think that I'm pretty sure that up in our cafe there's a little, I think I'm pretty sure that we've got a little letter signed by President Obama hanging on the wall up there.

44:43 That's pretty cool.

44:44 Related to these efforts.

44:45 Yeah.

44:46 So in effect, I mean, it's just great to see that more and more people are seeing how important this is, especially from a government point of view that, you know, a president has realized that, you know, this is one of the biggest frontiers that we really have left to solve.

45:02 I mean, we just, we know so little.

45:04 Yeah.

45:04 And so it's really great to see that the community as a whole is investing in this sort of research.

45:09 Yeah, from my perspective, I feel like that was one of the most important things was a recognition at the federal, nation-state level that BRAIN science is maturing into the type of thing that needs large data centers.

45:23 It needs large sharing and collaboration tools.

45:26 It needs large investigations to really start to make a difference in people's lives.

45:32 And we've, the science, the community has matured to the point where we can really start taking that standard forward and making a big impact.

45:41 Yeah, that's awesome.

45:42 And with the more open science and open source projects, it seems like that'll just amplify as people can work better together.

45:49 To be honest, we couldn't do it without the open source software community.

45:52 I was reflecting on this just the other day.

45:54 I don't have the name of the tweet.

45:56 I know a researcher with a first name, Jesse, tweeted a picture emphasizing that Matplotlib, Numpy, and Pandas collectively are supported by 12 full-time developers.

46:08 And when I think about the amount of science, both my own science, our institute's science, neuro, and then, you know, the rest of science generally, it's a huge edifice of work that's so critical to our national interest and to our interest as humans.

46:24 And to see that much work supported by so few people, but such a dedicated core group of people, it really struck a chord with me.

46:32 It's amazing to think of how just these small initiatives became such a foundation for what everyone is doing.

46:39 I was happy to see that the NSF recently gave like a $3 million grant or something like that to the SciPy group and NumPy and that.

46:49 So it's starting to get a lot more support.

46:51 NumFocus?

46:52 Yeah, I might have the exact details a little bit off like the number or whatever, but there was basically a big NSF grant to those groups to like keep that going stronger.

47:01 Because I think they realize exactly what you're saying.

47:03 Like all of these researchers, all these data scientists are day to day to day going, yeah, we have a terabyte of mouse video and we're going to give it to scikit-learn.

47:13 Or we're going to, you know, and then like, well, we need to make sure that scikit-learn keeps working or the foundations, right?

47:19 I saw, I remember seeing the, it was a call for some funding from Ken Reitz for requests, you know, the Python request module.

47:28 Just, it was like, it was like a month ago.

47:30 And it was like, oh, our goal is $3,000.

47:32 And I was like, $3,000 to support requests?

47:36 Like the number one most used Python module in the entire ecosystem?

47:40 Man, that's like pennies.

47:41 Think about the return on that.

47:43 That is pennies on the dollar.

47:44 It's unbelievable.

47:45 Yeah.

47:46 I mean, I don't know the exact numbers.

47:48 He had it on his website.

47:49 It's probably still there, but it's like downloaded like 7 million times a month or something.

47:54 I mean, it's really useful.

47:55 It's not just a little bit.

47:56 It's also very nice to see that funding organizations as a whole are starting to recognize too how much, I mean, one of the things that we've struggled with is just how much infrastructure goes into doing large data analysis.

48:10 And from 10 years ago, this wasn't a thing.

48:15 You know, and now, like Nick was saying, that we're all in our, you know, within a decade of our PhDs and we're all really working really hard to figure out how to process large sets of data, how to make that work, how to, you know, transfer our research code over to more production-like code and just how much infrastructure that takes.

48:33 Whereas people don't, a lot of times don't understand that actually.

48:37 So it's really nice to see that, you know, government funding organizations are starting to realize because the people on the ground doing this work are really like shouting, you know, we need the resources for this.

48:49 This takes a lot of time.

48:50 And this is so fundamental for, you know, the mission of this science.

48:54 Do you think people are being taught those skills in grad school these days?

48:57 Your question is so appropriate.

48:59 I don't have my finger firmly on the pulse.

49:01 I can only speak to my sort of small window into the University of Washington where I still maintain some contacts with my former advisor and his research group.

49:10 And I know it's definitely on the minds.

49:13 And I know at the graduate level, like the University of Washington's e-science group, they talk about these types of things.

49:19 Yeah, Jake Fenderplast and the group over there doing the e-science institute, that's something special as well.

49:24 That's a really cool place.

49:25 Exactly.

49:26 But I think UW is really forward thinking in that respect.

49:29 As far as its adoption, I'll maybe give a personal anecdote to drive it home.

49:33 When I came to the Allen Institute in 2012, I just finished my PhD and I took a research science position.

49:40 And I quickly realized the incredible amount that I can learn from some of my colleagues in the technology team.

49:47 I'd never heard of a unit test.

49:49 Like, we're going to test code?

49:51 What?

49:51 Why did you test code?

49:52 It works.

49:53 I just did my script in the works.

49:54 Yeah, I did that myself.

49:55 Exactly.

49:56 And then I sort of got my eyes open to the way that they approached their work.

50:01 And then I sort of fell into the well and have joined the group eventually.

50:05 But I definitely want to see the same sort of epiphany happen to scientists, not only in the graduate level, also the undergraduates.

50:13 You know, especially what I've seen in the talent of some of these undergraduate scientists that are coming out,

50:19 that are getting trained in the disciplines that didn't exist when I was an undergraduate.

50:25 And to see them start to take the tools that have been developed, you know, really for industry,

50:28 and of course, we have visibility in open source software, and take those and apply those to their research and build it into the DNA of how they work and think.

50:36 That's only going to amplify how open source software contributes to science for the next generation.

50:42 Yeah, it's exciting.

50:43 My lab that I came out of, I finished on my PhD about two years ago.

50:49 And we signed up for, the lab signed up for a GitHub account and started doing version control, you know, shortly after I switched over to Python.

50:58 It was sort of myself and another student who had done an internship at Google between undergrad and grad school that sort of,

51:07 between the two of us, you know, he kind of brought some best practices.

51:10 He's like, well, this is the way, you know, they did things.

51:13 And started getting stuff under version control.

51:15 And I still get pings on, like, changes to that repo.

51:18 And we kind of laid a foundation in that lab that has continued.

51:22 I mean, that group, you know, it's a whole new, I think he's still there, but, you know,

51:27 there's a whole new kind of cohort of students in that lab that I didn't know.

51:32 And they're doing research code development there in a very different way than when I entered that exact same lab.

51:39 And to them, that must be how you do it.

51:40 That's how we do it.

51:41 Yeah, exactly.

51:42 Yeah.

51:42 Yeah, that's cool.

51:43 All right.

51:44 Well, I want to be cognizant of your time, but I could keep talking for a long time because there's so many things to explore.

51:49 Well, let's do it again.

51:50 Yeah, yeah.

51:50 We could do a follow-up sometime.

51:52 Absolutely.

51:52 But this is super fascinating.

51:54 I think we'll leave it there for the brain science aspect.

51:57 So let me just ask you all the two questions at the end of the show.

52:01 And since there's three of you, I'll go kind of quick.

52:03 First of all, if you're going to write some code on Python, Justin, what editor do you use?

52:07 I'm loving Atom right now.

52:09 I kind of prototype in Jupyter, in JupyterLab actually now.

52:13 Yeah, yeah, that's starting to take off.

52:14 Yeah.

52:14 And then, but yeah, then I'm, most of my actual packages are in Atom.

52:18 Nice.

52:18 Corinne?

52:19 I'll either do some old school writing it in Emacs and then running it on the command line,

52:23 or if I'm using an editor with a debugger, I'll use Eclipse with PyDev.

52:28 Oh, yeah.

52:29 Nice.

52:29 Right on.

52:30 Nick?

52:31 Emacs when I'm remote and VS Code when I'm Visual Studio, when I'm local.

52:36 Yeah.

52:37 Visual Studio Code's really taken off.

52:38 I've just been tremendously impressed, especially as it's matured as an open source project.

52:43 And when the updates come in, they are timely and they are squashing bugs that people report.

52:48 It's been awesome to watch that project.

52:50 It's got a lot of momentum for sure.

52:51 Yeah.

52:52 All right.

52:52 That's good.

52:53 And then, notable PyPI package.

52:55 Maybe there's some package that people, you know, not necessarily request because it's the most popular,

53:00 but something you're like, oh, I saw this thing the other day and it's really amazing.

53:03 You should know about it.

53:04 Nick?

53:04 Go reverse order.

53:05 Oh, man.

53:06 Let me take a pass and I'll come back at the end.

53:08 I want to think for you in a second.

53:09 You want to get a good one.

53:09 All right.

53:10 Corinne?

53:10 Well, I really use very standardized packages just because I want to stay away from people having to install and use unmature code.

53:20 And the things I use the most are numpy, scipy, stats package, stuff like that.

53:26 Chester?

53:26 One of the things I've been impressed with recently is cookie cutter.

53:30 It's kind of speaking of, you know, we work with a lot of, speaking of onboarding kind of newer Python folks into good research,

53:38 good practices of testing, tooling, documentation, helping folks who have a little bit less knowledge of what a full-fledged package should look like

53:49 with a nice template has been absolutely invaluable.

53:52 Yeah, that's a great idea.

53:53 It's very, very helpful to just run a single command and poof, you got all the structure you're supposed to have.

53:58 Nick, you thought of one?

53:58 I got one, yeah.

53:59 Bokeh.

54:00 Oh, yeah.

54:01 It's the continuum visualization package.

54:04 I've been using it to build dashboards and widgets for doing analysis tooling and I just can't say enough about it.

54:12 The community that has grown up around it has just been so responsive and the power of that tool is it matures into the 1.0 release.

54:20 I'm just so excited to see where it goes because I use it daily and I love it.

54:23 Yeah, that's awesome.

54:24 All right.

54:25 Well, thank you so much.

54:26 Those are all great choices.

54:27 I guess I'll give you all, whoever wants to jump in and add something here, a chance for a final call to action.

54:32 People want to work with the Paul Allen Brain Institute or get involved with some of the tools or things you've talked about.

54:37 What do they do?

54:38 Where do they go?

54:39 Yeah, so we've got, so I mean, I think definitely for your users, I would, or for your users.

54:43 Listeners.

54:46 So definitely for your listeners, I think that our GitHub page, so we've got a, we've got a github.com/Allen Institute and we've got a bunch of different packages.

54:57 Everything from our, our production things like the Allen SDK to smaller packages that the individual people are releasing like Neuroglia, Arc Schema.

55:06 We've got a couple of kind of things in the, in the Python world as well as, you know, research code and packages that are affiliated with research projects.

55:14 So there's a bunch of stuff there that's a whole lot of Python.

55:16 Nice.

55:17 So our GitHub page will have lots of great examples of how to actually utilize the data that you can download too, which if you want to browse around on the data, go to our website and you can see the massive plethora of data that we have there that's available for everybody.

55:32 Yeah, excellent.

55:33 And one particular package, because it's so close to my heart, the Allen SDK, it's really the sort of one-stop shop to get your hands dirty digging into our data and it should work.

55:44 Just pip install.

55:45 And if it doesn't check out an issue and assign it to me, I'll, I'll tackle it as soon as I can.

55:50 Yeah.

55:50 And we've got any of our research that we've got going on here, Twitter, we're on Instagram.

55:54 We've got a bunch of job openings too, I think for some software developers, AllenSuit.com and there's a button somewhere for careers.

56:02 So yeah, there's a, there's a lot of fun stuff happening.

56:04 Awesome.

56:04 Yeah.

56:05 It sounds super exciting.

56:06 Thank you for sharing this view into what you're all up to.

56:09 Yeah.

56:09 Thank you so much for having us.

56:11 Thank you so much.

56:11 Thank you for having us.

56:12 Bye.

56:13 This has been another episode of Talk Python To Me.

56:17 Our guests on this episode have been Justin Kiggins, Corinne Teeter, and Nicholas Kane.

56:22 And this episode has been brought to you by Cox Automotive and Rollbar.

56:26 Join Cox Automotive and use your technical skills to transform the way the world buys,

56:31 sells, and owns cars.

56:32 Find an exciting technical position that's right for you at talkpython.fm/cox, C-O-X.

56:39 Rollbar takes the pain out of errors.

56:42 They give you the context and insight you need to quickly locate and fix errors that might have

56:47 gone unnoticed until your users complain, of course.

56:50 As Talk Python To Me listeners, track a ridiculous number of errors for free at rollbar.com slash

56:56 Talk Python To Me.

56:58 Want to level up your Python?

56:59 If you're just getting started, try my Python jumpstart by building 10 apps or our brand new

57:04 100 days of code in Python.

57:06 And if you're interested in more than one course, be sure to check out the Everything Bundle.

57:10 It's like a subscription that never expires.

57:12 Be sure to subscribe to the show.

57:14 Open your favorite podcatcher and search for Python.

57:16 We should be right at the top.

57:18 You can also find the iTunes feed at /itunes, Google Play feed at /play, and direct

57:24 RSS feed at /rss on talkpython.fm.

57:27 This is your host, Michael Kennedy.

57:29 Thanks so much for listening.

57:30 I really appreciate it.

57:31 Now get out there and write some Python code.

57:33 I'll see you next time.

57:53 Thank you.