WEBVTT

00:00:00.001 --> 00:00:02.420
Remember back in math class when you'd take a test?

00:00:02.420 --> 00:00:04.920
It wasn't enough to just write down the answer.

00:00:04.920 --> 00:00:07.340
What's the limit of that infinite summation?

00:00:07.340 --> 00:00:08.220
Pi over 2.

00:00:08.220 --> 00:00:11.000
Yes, but how did you get to that number?

00:00:11.000 --> 00:00:13.400
Some problems in programming are just like this.

00:00:13.400 --> 00:00:15.960
We want to keep track of the computations done

00:00:15.960 --> 00:00:18.740
and only add more steps to the results.

00:00:18.740 --> 00:00:22.000
Heck, that's basically the entire premise of functional programming.

00:00:22.000 --> 00:00:24.960
On this episode, you'll meet Christopher Ariza,

00:00:24.960 --> 00:00:27.080
who created a project called Static Frame.

00:00:27.080 --> 00:00:29.860
Think of it like pandas in NumPy,

00:00:29.860 --> 00:00:32.900
but it never changes the computations it's already performed.

00:00:32.900 --> 00:00:33.760
It just adds to them.

00:00:33.760 --> 00:00:39.440
This is Talk Python to Me, episode 204, recorded February 7th, 2019.

00:00:39.440 --> 00:00:55.900
Welcome to Talk Python to Me, a weekly podcast on Python,

00:00:55.900 --> 00:00:58.860
the language, the libraries, the ecosystem, and the personalities.

00:00:58.860 --> 00:01:00.780
This is your host, Michael Kennedy.

00:01:00.780 --> 00:01:02.920
Follow me on Twitter, where I'm @mkennedy.

00:01:02.920 --> 00:01:06.680
Keep up with the show and listen to past episodes at talkpython.fm,

00:01:06.680 --> 00:01:09.100
and follow the show on Twitter via at Talk Python.

00:01:09.100 --> 00:01:10.980
Chris, welcome to Talk Python.

00:01:10.980 --> 00:01:12.240
Hi, Michael. Glad to be on the show.

00:01:12.240 --> 00:01:13.900
Yeah, it's great to have you on the show.

00:01:13.900 --> 00:01:17.620
You have a really interesting library that you've been working on,

00:01:17.620 --> 00:01:22.860
and it's an interesting sort of data-safe take on the whole Pandas API,

00:01:22.860 --> 00:01:26.240
which I think is going to be a lot of fun for all the data scientists

00:01:26.240 --> 00:01:28.120
and other people working with pandas out there.

00:01:28.120 --> 00:01:29.440
Great. I look forward to talking about it.

00:01:29.440 --> 00:01:33.380
Absolutely. But before we do, let's get started on your story.

00:01:33.380 --> 00:01:34.720
How did you get into programming in Python?

00:01:34.720 --> 00:01:37.480
Sure. I started programming in Python in the year 2000.

00:01:38.040 --> 00:01:42.880
I was a graduate student at NYU, and I was doing a lot of work in computer music

00:01:42.880 --> 00:01:45.020
and algorithmic composition specifically.

00:01:45.020 --> 00:01:51.000
And I was looking for a way to extend my capacity, you know,

00:01:51.000 --> 00:01:55.180
with these very high-level synthesis languages that I was using.

00:01:55.180 --> 00:01:58.880
So I decided to learn programming in more depth.

00:01:58.880 --> 00:02:02.360
And I was a graduate student there, so I took a course in C programming.

00:02:02.820 --> 00:02:07.080
And I had a graduate advisor who was supposed to oversee my work at the time.

00:02:07.080 --> 00:02:10.400
And I had this great idea to build this system in C.

00:02:10.400 --> 00:02:12.320
And so I sat with this advisor, and I was like,

00:02:12.320 --> 00:02:13.820
I really want to do this thing in C.

00:02:13.820 --> 00:02:14.660
And he said to me,

00:02:14.660 --> 00:02:18.800
well, if you wanted to build a car, would you start by building every screw?

00:02:18.800 --> 00:02:20.180
And I said, no.

00:02:20.580 --> 00:02:22.460
And so he said, use Python.

00:02:22.460 --> 00:02:23.780
And I was like, what's Python?

00:02:23.780 --> 00:02:25.740
He said, it's this new great language.

00:02:25.740 --> 00:02:26.460
Go try it out.

00:02:26.460 --> 00:02:31.080
And so I walked over to Barnes & Noble in Astor Place in New York City,

00:02:31.080 --> 00:02:34.620
back when there were bookstores at Barnes & Noble.

00:02:34.620 --> 00:02:37.040
Yeah, they used to have a great computer section, right?

00:02:37.040 --> 00:02:39.100
You could go and browse and see what was interesting.

00:02:39.100 --> 00:02:42.940
That was a way you learned about stuff, and not so much these days, right?

00:02:42.940 --> 00:02:43.900
Yeah, exactly.

00:02:43.900 --> 00:02:45.380
Yeah, so you went over and got this book, yeah.

00:02:45.380 --> 00:02:46.820
I got Learning Python, actually.

00:02:46.820 --> 00:02:50.340
There was a version of Learning Python out in the year 2000,

00:02:50.340 --> 00:02:52.100
and I picked it up and started Learning Python,

00:02:52.100 --> 00:02:54.680
and started building a system I called Athena CL.

00:02:54.680 --> 00:02:57.140
This was a tool for algorithmic composition,

00:02:57.140 --> 00:03:00.500
closely tied to a synthesis language called C-sound,

00:03:00.500 --> 00:03:03.420
and did a bunch of work in that, culminating in my dissertation.

00:03:03.420 --> 00:03:04.440
Wow, that's really cool.

00:03:04.440 --> 00:03:09.160
So your dissertation is music and algorithmic composition, basically?

00:03:09.160 --> 00:03:09.660
That's right.

00:03:09.660 --> 00:03:11.260
Okay, so cool.

00:03:11.260 --> 00:03:12.880
So when you started working on this library,

00:03:12.880 --> 00:03:14.880
and you were doing the programming around it,

00:03:14.980 --> 00:03:19.300
was this to have the computer generate music,

00:03:19.300 --> 00:03:21.620
or try to use the computer to understand music?

00:03:21.620 --> 00:03:22.940
What was the goal there?

00:03:22.940 --> 00:03:26.740
I was studying, I was getting a PhD in music composition and theory,

00:03:26.740 --> 00:03:29.500
and so I was using these synthesis languages

00:03:29.500 --> 00:03:32.520
that took text-based input of event data.

00:03:32.520 --> 00:03:35.620
So C-sound is an ancient, well, it's not ancient,

00:03:35.620 --> 00:03:37.380
but it's a very old synthesis language.

00:03:37.380 --> 00:03:38.220
In computer terms.

00:03:38.220 --> 00:03:41.200
Yeah, well, it actually comes from an ancient lineage.

00:03:41.200 --> 00:03:44.600
It comes from the very first synthesis languages

00:03:44.600 --> 00:03:48.580
that Max Matthews invented in Bell Labs in the 60s.

00:03:48.580 --> 00:03:48.940
Wow.

00:03:48.940 --> 00:03:52.220
Called Music 1 and Music 2 and up to Music 5.

00:03:52.220 --> 00:03:54.500
Those happened at Bell Labs in the 60s and 70s.

00:03:54.500 --> 00:03:57.760
And C-sound is the modern version of that.

00:03:57.760 --> 00:04:00.780
And it takes basically a text input file

00:04:00.780 --> 00:04:03.860
defining your events and your parameters.

00:04:04.360 --> 00:04:07.260
And it became quite clear that you could do really cool things

00:04:07.260 --> 00:04:10.800
if you could use code to generate these text input files.

00:04:10.800 --> 00:04:13.360
And of course, it's quite straightforward to do in Python.

00:04:13.360 --> 00:04:16.040
So I wasn't using Python to do synthesis.

00:04:16.040 --> 00:04:18.380
I was using Python to generate control data

00:04:18.380 --> 00:04:20.560
that I would then feed into C-sound.

00:04:20.560 --> 00:04:20.960
Okay.

00:04:20.960 --> 00:04:21.740
How interesting.

00:04:21.740 --> 00:04:26.360
Have you seen some of the programmatic generated music

00:04:26.360 --> 00:04:28.760
that people are doing with Python lately?

00:04:28.760 --> 00:04:29.400
I'm not sure.

00:04:29.400 --> 00:04:30.740
Is there something specific you're thinking of?

00:04:30.800 --> 00:04:31.040
Yeah.

00:04:31.040 --> 00:04:33.800
There's been a couple of presentations around...

00:04:33.800 --> 00:04:35.200
Gosh, I wish I could remember the library.

00:04:35.200 --> 00:04:37.700
But there's a couple of libraries that you can use

00:04:37.700 --> 00:04:44.160
to basically live in the REPL program out songs

00:04:44.160 --> 00:04:44.900
and interactions.

00:04:44.900 --> 00:04:47.020
And yeah, it's pretty wild.

00:04:47.020 --> 00:04:49.540
So maybe I'll find a link and throw it into the show notes

00:04:49.540 --> 00:04:50.480
because I can't remember the name.

00:04:50.480 --> 00:04:52.820
But there's been a few really good conference presentations

00:04:52.820 --> 00:04:54.680
that are basically live musical performances

00:04:54.680 --> 00:04:56.660
done by programming Python.

00:04:56.900 --> 00:04:59.440
I think I saw one at PyCon last year.

00:04:59.440 --> 00:05:02.520
I believe that one worked on top of the SuperCollider language.

00:05:02.520 --> 00:05:04.960
So SuperCollider is a synthesis language

00:05:04.960 --> 00:05:07.720
that's much more modern than C-sound

00:05:07.720 --> 00:05:11.340
that has a synthesis server independent from the language.

00:05:11.340 --> 00:05:14.640
And it's possible to use other languages to control the server.

00:05:14.640 --> 00:05:16.520
So I haven't been following it too closely,

00:05:16.520 --> 00:05:18.700
but I suspect that some people are using Python

00:05:18.700 --> 00:05:21.120
to control the SuperCollider server,

00:05:21.120 --> 00:05:22.300
which is a great idea.

00:05:22.300 --> 00:05:22.580
Yeah.

00:05:22.580 --> 00:05:23.940
It's definitely interesting to see.

00:05:24.060 --> 00:05:25.460
I'll throw in the video for people

00:05:25.460 --> 00:05:27.240
because if you haven't seen it, it's pretty creative.

00:05:27.240 --> 00:05:28.060
All right.

00:05:28.060 --> 00:05:31.140
So today you're not doing that much music theory, right?

00:05:31.140 --> 00:05:32.360
You're working in a different discipline.

00:05:32.360 --> 00:05:34.000
Tell us about what you do day to day.

00:05:34.000 --> 00:05:36.920
I worked in academia for a while at a few places

00:05:36.920 --> 00:05:40.380
and was continuing to do my work in algorithmic composition,

00:05:40.380 --> 00:05:41.860
generative music, that sort of work.

00:05:41.860 --> 00:05:43.760
But, you know, decided to look for something else

00:05:43.760 --> 00:05:46.660
and found a job at a firm called Research Affiliates.

00:05:46.660 --> 00:05:48.200
We're a finance firm.

00:05:48.480 --> 00:05:52.000
We define and build strategies for investment

00:05:52.000 --> 00:05:54.260
that we license to many parties around the world.

00:05:54.260 --> 00:05:54.620
That's cool.

00:05:54.620 --> 00:05:57.560
Is this like the so-called algorithmic trading type of stuff?

00:05:57.560 --> 00:05:59.600
Essentially, our strategies are,

00:05:59.600 --> 00:06:03.160
many of our strategies are what we call passive investment vehicles,

00:06:03.160 --> 00:06:05.160
which means that there is an algorithm,

00:06:05.160 --> 00:06:07.680
a specific procedure that's used to generate

00:06:07.680 --> 00:06:10.200
the portfolio constituents and the weights.

00:06:10.200 --> 00:06:13.080
Our strategies are fairly slow moving.

00:06:13.080 --> 00:06:14.800
It's not high frequency trading.

00:06:14.800 --> 00:06:15.960
It's not anything like that.

00:06:15.960 --> 00:06:18.820
Yeah, you're not looking for sub-millisecond advantages.

00:06:18.820 --> 00:06:23.660
You're looking for just applying algorithms to long-term investing.

00:06:23.660 --> 00:06:24.040
That's right.

00:06:24.040 --> 00:06:24.260
All right.

00:06:24.260 --> 00:06:25.980
So if like Warren Buffett were a programmer,

00:06:25.980 --> 00:06:27.160
he might be doing stuff like that.

00:06:27.160 --> 00:06:28.480
Yeah, that's right.

00:06:28.480 --> 00:06:29.260
I guess.

00:06:29.260 --> 00:06:30.580
It's funny.

00:06:30.580 --> 00:06:30.900
All right.

00:06:30.900 --> 00:06:31.320
Cool.

00:06:31.320 --> 00:06:37.640
So that brings us sort of full circle back to this idea of pandas

00:06:37.640 --> 00:06:41.800
and your variation, your take on a slightly different library

00:06:41.800 --> 00:06:43.300
that is pandas-like.

00:06:43.700 --> 00:06:49.640
So pandas also comes from the whole finance space, right?

00:06:49.640 --> 00:06:52.560
Like it's very popular in data science,

00:06:52.560 --> 00:06:55.320
but of course, you know, it originated out of finance,

00:06:55.320 --> 00:06:57.740
which is maybe one part of data science, I guess, right?

00:06:57.740 --> 00:06:58.340
Maybe.

00:06:58.340 --> 00:07:01.640
I see data science as,

00:07:01.640 --> 00:07:04.760
and there's certainly discussions and presentations on this issue,

00:07:04.760 --> 00:07:08.940
I see it as, in some ways, more speculative research into data.

00:07:09.040 --> 00:07:14.740
As opposed to using these tools for the systematic application of an algorithm or a procedure.

00:07:14.740 --> 00:07:14.980
Right.

00:07:14.980 --> 00:07:19.860
So what you're thinking is more that like what you guys do day to day is more programming,

00:07:19.860 --> 00:07:23.580
using these tools that maybe originated out of data science,

00:07:23.580 --> 00:07:27.220
but you're deploying production systems that are running and doing stuff,

00:07:27.220 --> 00:07:31.780
not so much coming up with graphs and inferring stuff with Jupyter.

00:07:31.920 --> 00:07:32.600
Yeah, exactly.

00:07:32.600 --> 00:07:32.880
Right.

00:07:32.880 --> 00:07:33.180
Okay.

00:07:33.180 --> 00:07:33.440
Yeah.

00:07:33.440 --> 00:07:37.220
But at my firm, we have finance researchers who that's what they do.

00:07:37.220 --> 00:07:42.860
They comb over data and study research and, you know, try to make observations from the data.

00:07:42.860 --> 00:07:44.400
You know, that is closer to data science.

00:07:44.400 --> 00:07:48.000
But by the time strategies come to us, they are well-defined.

00:07:48.000 --> 00:07:51.200
So we are, you know, implementing the production strategy

00:07:51.200 --> 00:07:55.420
and don't really have that sort of discovery exploration need.

00:07:55.860 --> 00:07:55.980
Okay.

00:07:55.980 --> 00:07:59.060
So let's maybe start from the beginning.

00:07:59.060 --> 00:08:05.580
You told me the idea of building these strategies and you've created this library called Static Frame,

00:08:05.580 --> 00:08:06.480
which is really interesting.

00:08:06.480 --> 00:08:07.540
We're going to talk a lot about it.

00:08:07.540 --> 00:08:09.840
But you started with Pandas, right?

00:08:09.840 --> 00:08:13.280
You're like, let's use Pandas and other Python libraries to solve this problem

00:08:13.280 --> 00:08:16.440
before you decided I'm going to replace Pandas with my own library, right?

00:08:16.440 --> 00:08:17.060
Yeah, that's right.

00:08:17.060 --> 00:08:17.740
Maybe talk about that journey.

00:08:18.160 --> 00:08:18.660
Sure, sure.

00:08:18.660 --> 00:08:23.980
Well, actually, when I started at Research Affiliates in the year 2012,

00:08:23.980 --> 00:08:27.160
Pandas was still quite young at that point.

00:08:27.160 --> 00:08:34.000
And my predecessor had created his own library to model data transformations

00:08:34.000 --> 00:08:42.000
and basically storing data in a table and then efficiently being able to add new data to that table by column addition,

00:08:42.000 --> 00:08:46.040
kind of an Excel-like data model, but implemented in Python.

00:08:46.040 --> 00:08:48.380
And his implementation was very straightforward.

00:08:48.380 --> 00:08:54.140
It was simply a dictionary that held rows where the rows themselves were a dictionary.

00:08:54.140 --> 00:09:02.300
So kind of like a JSON representation of a table, if you will, a dictionary for the rows and then a dictionary for each row.

00:09:02.300 --> 00:09:08.780
And there, of course, we weren't using NumPy as the back end, but it was actually reasonably efficient.

00:09:08.780 --> 00:09:10.900
And we still use it in some places.

00:09:11.680 --> 00:09:18.880
After about in 2013, we started to, I spent some time looking at Pandas and started to use it because, of course,

00:09:18.880 --> 00:09:25.560
the underlying performance in large part due to NumPy and being able to use the vector operations of NumPy

00:09:25.560 --> 00:09:29.560
gave a significant advantage over using our own table model.

00:09:29.560 --> 00:09:29.880
Yeah.

00:09:29.960 --> 00:09:37.240
There's such an advantage to using things like NumPy where you hand a little data off to the C layer and that layer can do all the computation.

00:09:37.240 --> 00:09:44.520
There was a really interesting analogy or observation made by Alessandro Molina from the last couple episodes ago.

00:09:44.520 --> 00:09:49.240
And he was talking about Python is one of these languages that is a little bit counterintuitive.

00:09:50.240 --> 00:09:57.980
Like in C, if you want something to go really fast, you might make it go really fast by writing and implementing the details in C or some other language.

00:09:57.980 --> 00:10:01.180
And so the more you can kind of control that, the more precise you can be.

00:10:01.180 --> 00:10:05.080
Where Python gets faster, the more high level you try to treat it.

00:10:05.080 --> 00:10:08.280
So if you tried to implement those algorithms in pure Python, they'd be slow.

00:10:08.360 --> 00:10:13.680
But if you just call like a high level NumPy function, boom, it's fast, right?

00:10:13.680 --> 00:10:20.280
And so it's like this sort of inverted understanding of like where the performance is in this language compared to others.

00:10:20.280 --> 00:10:20.560
Yeah.

00:10:20.560 --> 00:10:24.220
And, you know, I have to admit, I had looked at NumPy before.

00:10:24.220 --> 00:10:26.240
I'd been using Python for a long time.

00:10:26.580 --> 00:10:38.560
But the context of using it through pandas as a wrapper to NumPy really started me thinking, oh, when I want to scale a vector, I just multiply by the value.

00:10:38.560 --> 00:10:39.580
And I scale the whole vector.

00:10:39.580 --> 00:10:41.100
And this happens amazingly fast.

00:10:41.100 --> 00:10:41.940
And there's no loops.

00:10:41.940 --> 00:10:46.420
And you begin to take on that mindset that wherever you have a loop, you're doing something wrong.

00:10:46.420 --> 00:10:49.540
You know, you want all your loops to be in the NumPy layer.

00:10:49.540 --> 00:10:52.560
And it takes a bit of conceptual work to get there.

00:10:52.560 --> 00:10:54.300
Yeah, that's such an interesting observation.

00:10:54.300 --> 00:10:56.380
I definitely think that that's true.

00:10:56.380 --> 00:10:58.680
And you want to let it do that for you.

00:10:58.680 --> 00:11:00.960
But to think, oh, there's a loop.

00:11:00.960 --> 00:11:05.960
Where are we missing the opportunity to make this work the way pandas and NumPy should?

00:11:05.960 --> 00:11:07.400
Yeah, absolutely.

00:11:07.400 --> 00:11:11.320
I mean, yeah, that's exactly in our team code reviews.

00:11:11.320 --> 00:11:14.100
That's exactly one of the things that we sort of look out for.

00:11:14.100 --> 00:11:16.520
We see, you know, I see some pandas code that somebody wrote.

00:11:16.520 --> 00:11:19.060
And I see a couple loops or a loop in a loop.

00:11:19.060 --> 00:11:21.140
And I'm like, oh, there's got to be a better way.

00:11:21.140 --> 00:11:22.900
Quote Raymond.

00:11:22.900 --> 00:11:24.280
Yeah, absolutely.

00:11:24.280 --> 00:11:25.320
There's got to be a better way.

00:11:25.320 --> 00:11:26.040
That's cool.

00:11:26.040 --> 00:11:27.220
All right.

00:11:27.220 --> 00:11:30.940
This transition over to NumPy and pandas, this was pretty successful, right?

00:11:30.940 --> 00:11:36.800
Like you guys were able to replace that library and do more of your work in these libraries and these packages?

00:11:36.800 --> 00:11:37.540
Yeah, that's right.

00:11:37.540 --> 00:11:41.360
We didn't entirely replace it because the old table model worked reasonably well in a few cases.

00:11:41.360 --> 00:11:46.000
But in implementing some new strategies, some new tools, I started working with pandas.

00:11:46.180 --> 00:11:54.020
And it's funny because although you have a bunch of utilities on pandas, it's hard to figure out what to do with it or how to use it, really, I think.

00:11:54.020 --> 00:11:55.560
But I already had a precedent.

00:11:55.560 --> 00:12:00.280
The precedent I had from our old library was that you start with data in a table.

00:12:00.400 --> 00:12:05.100
You load up initial data, initial observations about companies, for example.

00:12:05.300 --> 00:12:10.260
And you may maybe have 40 columns on a table of 10,000 companies.

00:12:10.260 --> 00:12:12.060
And that's your initial inputs.

00:12:12.060 --> 00:12:20.680
And then you add new data by doing operations, applying functions on those rows or previous columns and add new columns.

00:12:20.680 --> 00:12:25.500
And the previous library we used was actually very much aligned to that workflow.

00:12:25.600 --> 00:12:33.200
So moving to pandas was actually quite smooth because pandas very easily supports growing a data frame by adding columns.

00:12:33.200 --> 00:12:43.080
And those column additions can easily be performed by doing operations on columns that are already on the table or doing function application to rows already in the table.

00:12:43.080 --> 00:12:43.900
That makes a lot of sense.

00:12:43.900 --> 00:12:46.040
We discussed this a little bit before.

00:12:46.040 --> 00:13:05.260
You talked then about how it was really important as your data flows down the pipeline that is doing all the calculations and eventually comes to a decision of invest in this, not in that, or this much in that area to keep a history and keep track of what's happening.

00:13:05.260 --> 00:13:06.300
Right?

00:13:06.300 --> 00:13:07.040
Yeah, exactly.

00:13:07.040 --> 00:13:10.240
That paradigm was established before we ever moved to pandas.

00:13:10.240 --> 00:13:15.840
And it was a large part of the approach and ethos of my company and how we do our work.

00:13:15.840 --> 00:13:17.900
Our strategies are not black boxes.

00:13:17.900 --> 00:13:22.980
We don't use esoteric machine learning to discover results.

00:13:22.980 --> 00:13:26.380
We use very explicit approaches that we want to be transparent.

00:13:26.380 --> 00:13:31.440
And we do human quality control on everything that we release.

00:13:31.440 --> 00:13:38.060
So it's my obligation to expose as much of the internal calculations as possible.

00:13:38.060 --> 00:13:45.780
A lot of the intermediates, values, groupings, labels, everything that is necessary for a human to understand.

00:13:45.780 --> 00:13:49.320
The calculation we try to expose in our final output.

00:13:49.320 --> 00:13:55.140
So the table becomes initial data first as 20 or 40 or more columns as your initial data.

00:13:55.140 --> 00:14:04.760
And then numerous columns that are intermediate calculations, intermediate results, reducing the opportunity set through screens and some other processes.

00:14:04.760 --> 00:14:11.500
And then finally getting to the actual result, which in our case is weights and constituents.

00:14:11.500 --> 00:14:11.900
Yeah.

00:14:11.900 --> 00:14:16.740
It sounds very inspired by what you almost might do in Excel or Google Sheets.

00:14:16.740 --> 00:14:17.220
Right?

00:14:17.220 --> 00:14:20.360
You have your data and then you create a formula here.

00:14:20.360 --> 00:14:23.400
And then that female formula is based on that previous one.

00:14:23.460 --> 00:14:29.120
But you would never go and like replace the original data and change it with a formula in like some weird iterative way.

00:14:29.120 --> 00:14:31.840
It's always just kind of like to the right and down.

00:14:31.840 --> 00:14:34.900
That would be disciplined use of Excel.

00:14:34.900 --> 00:14:40.240
Unfortunately, there's no discipline inherently in Excel.

00:14:40.240 --> 00:14:41.900
So you see all sorts of things.

00:14:41.900 --> 00:14:43.200
Yeah, that's true.

00:14:43.200 --> 00:14:43.700
That's true.

00:14:43.700 --> 00:14:47.600
But, you know, I guess inspired at least by the proper use.

00:14:47.600 --> 00:14:48.680
So that's pretty cool.

00:14:49.460 --> 00:14:51.680
So you did all this in Pandas and that was working really well.

00:14:51.680 --> 00:14:53.080
Why create a new library?

00:14:53.080 --> 00:15:01.260
Like what were the pain points or what were you like, you know, if we redesign this to be Panda-like but not Pandas, what could you gain?

00:15:01.260 --> 00:15:01.680
Right.

00:15:01.680 --> 00:15:10.940
The initial inspiration was, you know, recognizing this workflow that we had where we would start with initial data and some columns and then add columns as we go.

00:15:11.340 --> 00:15:24.460
That initial workflow, we found that it worked really good and it was relatively safe as long as you followed this sort of grow-only paradigm where the table only gets bigger.

00:15:24.460 --> 00:15:25.580
Yeah, exactly.

00:15:25.580 --> 00:15:26.140
Exactly.

00:15:26.140 --> 00:15:26.380
Yeah.

00:15:26.380 --> 00:15:29.880
So that's the discipline that we were doing implicitly.

00:15:30.000 --> 00:15:40.660
But we started to speculate, well, you know, it would be really nice if we could actually enforce this grow-only paradigm and in doing so remove a lot of opportunities for error.

00:15:40.660 --> 00:15:48.480
Now, by convention, we would never mid-process go and update in place a value we had already used.

00:15:48.700 --> 00:15:56.600
But we were certainly sensitive to that danger and in particular for teaching our paradigm to new members of our team and the rest.

00:15:56.600 --> 00:16:03.380
We had this strong desire, oh, it would be so great if we could sort of enforce in some way this grow-only paradigm.

00:16:03.380 --> 00:16:09.200
And that led naturally to thinking, well, what if the frame data itself could be immutable?

00:16:09.200 --> 00:16:13.840
More than just enforcing a grow-only paradigm, what if you had immutable frames?

00:16:13.960 --> 00:16:20.520
There's places where we open up a – we use a table as a reference data set and we might bring that in as a data frame.

00:16:20.520 --> 00:16:29.700
So I might bring in FX rates, currency conversion rates as a series indexed by currency code and the currency conversion value.

00:16:29.700 --> 00:16:33.280
And that's a reference value that I'm using in many, many, many places.

00:16:33.280 --> 00:16:37.880
And the opportunity for error if any of those values gets mutated is significant.

00:16:37.880 --> 00:16:40.100
So we kept on coming back to this.

00:16:40.100 --> 00:16:51.040
Like, it would be so great if we could freeze a series or freeze a table like we have frozen set, for example, and treat it as a immutable collection.

00:16:51.040 --> 00:16:51.320
Yeah.

00:16:51.320 --> 00:16:57.400
And, of course, it completely simplifies the whole debugging and validation, right?

00:16:57.400 --> 00:17:13.020
Because you no longer have to look for these weird references where somebody still has a pointer to the data frame and they call a function that changes the values or, you know, does some other odd thing where they're off by a column index or something.

00:17:13.020 --> 00:17:17.400
And, you know, it seems like debugging that would be really hard.

00:17:17.400 --> 00:17:20.400
And, of course, making financial decisions on it might be really bad.

00:17:20.540 --> 00:17:21.100
Yeah, that's right.

00:17:21.100 --> 00:17:25.100
It reduces what I often say is it reduces the opportunity for error.

00:17:25.100 --> 00:17:35.620
There's many ways that you can things can go wrong and you can get very confusing, unexpected results by mutating your inputs and your values as you go.

00:17:35.620 --> 00:17:36.040
Yeah.

00:17:36.040 --> 00:17:37.740
There's the safety side of things.

00:17:37.740 --> 00:17:39.040
And that makes perfect sense.

00:17:39.040 --> 00:17:40.020
That's probably primary.

00:17:40.260 --> 00:17:53.380
Another thing that immutable data really opens up, I don't know if this matters at all to you guys, but anytime you have immutable data, you start to have incredible opportunities for parallelism, right?

00:17:53.380 --> 00:17:57.560
Like if you're sharing it, you don't have to worry about, oh, I got to lock on this and make sure that's not changed.

00:17:57.560 --> 00:18:00.020
You just riff on it because it's immutable.

00:18:00.020 --> 00:18:00.960
It's not going to change.

00:18:00.960 --> 00:18:04.920
Yeah, that's a really interesting potential that I haven't really explored.

00:18:04.920 --> 00:18:18.820
The one way I have explored it with Static Frame is that our function application iterators expose an interface to multiprocessing or multithreading function application to columns and rows.

00:18:18.820 --> 00:18:23.300
So I've experimented with a little bit, but there's definitely more opportunities to look out for that.

00:18:23.300 --> 00:18:25.040
Yeah, it sounds like for sure.

00:18:25.040 --> 00:18:31.440
And maybe you could even mix in some Cython in there so it releases the GIL for the threaded side of stories.

00:18:31.440 --> 00:18:34.860
And it just seems like there's a lot of cool possibilities to dig into that.

00:18:34.860 --> 00:18:36.740
Is performance at all something you care about?

00:18:36.740 --> 00:18:40.960
Or is it like, it takes two minutes and we run it once a day or once a week, so it's fine?

00:18:40.960 --> 00:18:41.620
Yeah, definitely.

00:18:41.620 --> 00:18:43.540
Performance is a very significant concern.

00:18:43.540 --> 00:18:53.920
And as I was doing this, as I started working on this in May of 2017, and I started very small with this speculation.

00:18:53.920 --> 00:18:56.900
I wasn't sure if I could do this in native Python.

00:18:56.900 --> 00:19:06.200
That was the thing is that we, for years prior, me and my team who shared these convictions and these goals, speculated on something like this.

00:19:06.200 --> 00:19:08.240
And I always thought I was going to have to implement it in C.

00:19:08.240 --> 00:19:10.380
It's time to build the screws and the nuts and everything.

00:19:10.380 --> 00:19:11.720
Exactly, exactly.

00:19:11.720 --> 00:19:12.140
Yeah.

00:19:12.140 --> 00:19:14.120
So I was going to have to implement this in C.

00:19:14.200 --> 00:19:16.260
And maybe I'd done some work in C++.

00:19:16.260 --> 00:19:21.700
So, you know, I was like, maybe I can implement this in C++ and build off the STL vectors.

00:19:21.700 --> 00:19:25.020
And then I realized, oh, man, I'm re-implementing NumPy.

00:19:25.020 --> 00:19:26.300
I don't want to do that.

00:19:26.800 --> 00:19:36.060
And it was after a PyCon, I think it was two years ago, yeah, because it was in May of 2017, something that the PyCon triggered for me that, you know, why don't I just try it in native Python?

00:19:36.060 --> 00:19:43.540
And if I've hit bottlenecks, I can use Cython, but I should just see what I can do and use NumPy and just use Python.

00:19:43.540 --> 00:19:48.700
And I set out on that goal, and I found that performance is very good.

00:19:48.700 --> 00:19:52.880
I mean, I can get in for many operations I can do as well or better than Pandas.

00:19:52.880 --> 00:19:54.240
Some operations I'm slower.

00:19:54.240 --> 00:19:56.520
Some operations I'm better or significantly better.

00:19:56.900 --> 00:19:59.260
The aggregate performance is very hard to measure.

00:19:59.260 --> 00:20:00.860
It's very dependent on use cases.

00:20:00.860 --> 00:20:03.640
There's some things that are definitely slower than Pandas.

00:20:03.640 --> 00:20:06.680
But at this point, it's just pure Python, pure NumPy.

00:20:06.680 --> 00:20:11.580
We haven't done anything in Cython or C extensions or Numba or anything like that.

00:20:11.580 --> 00:20:16.480
This portion of Talk Python to me is brought to you by Linode.

00:20:16.480 --> 00:20:20.200
Are you looking for hosting that's fast, simple, and incredibly affordable?

00:20:20.200 --> 00:20:25.320
Well, look past that bookstore and check out Linode at talkpython.fm/Linode.

00:20:25.500 --> 00:20:27.180
That's L-I-N-O-D-E.

00:20:27.180 --> 00:20:31.640
Planes start at just $5 a month for a dedicated server with a gig of RAM.

00:20:31.640 --> 00:20:33.840
They have 10 data centers across the globe.

00:20:33.840 --> 00:20:37.680
So no matter where you are or where your users are, there's a data center for you.

00:20:37.680 --> 00:20:42.160
Whether you want to run a Python web app, host a private Git server, or just a file server,

00:20:42.160 --> 00:20:48.960
you'll get native SSDs on all the machines, a newly upgraded 200 gigabit network, 24-7 friendly

00:20:48.960 --> 00:20:52.080
support, even on holidays, and a seven-day money-back guarantee.

00:20:52.080 --> 00:20:53.720
Need a little help with your infrastructure?

00:20:53.720 --> 00:20:58.460
They even offer professional services to help you with architecture, migrations, and more.

00:20:58.460 --> 00:21:01.400
Do you want a dedicated server for free for the next four months?

00:21:01.400 --> 00:21:04.460
Just visit talkpython.fm/Linode.

00:21:05.720 --> 00:21:10.440
So you're getting great performance out of it already, and then there's all these low-hanging

00:21:10.440 --> 00:21:12.020
fruit opportunities if needed.

00:21:12.020 --> 00:21:12.600
Yeah, that's right.

00:21:12.600 --> 00:21:12.840
Yeah.

00:21:12.840 --> 00:21:17.180
So it's interesting you talk about the performance and could I do it this way?

00:21:17.180 --> 00:21:22.900
And I think people, programmers, are really bad at judging what's going to be fast and what's

00:21:22.900 --> 00:21:23.480
going to be slow.

00:21:24.320 --> 00:21:27.560
You look at some code, you're like, oh, this is definitely the problem.

00:21:27.560 --> 00:21:31.560
Maybe it's slower, but it's sub-millisecond and who cares?

00:21:31.560 --> 00:21:34.060
Or it's actually not even that part.

00:21:34.060 --> 00:21:34.940
It's something totally different.

00:21:34.940 --> 00:21:40.120
Did you do profiling and stuff like that to really try to dial that in, or did it just work

00:21:40.120 --> 00:21:40.300
out?

00:21:40.300 --> 00:21:43.680
Early on, I started benchmarking against pandas for certain operations.

00:21:43.680 --> 00:21:46.420
And so I don't think of my...

00:21:46.420 --> 00:21:51.880
It's actually a huge debt to pandas that they've provided this great framework that does so much

00:21:51.880 --> 00:21:53.320
and really sets the foundation.

00:21:53.320 --> 00:21:54.860
Of course, it's descended from R.

00:21:54.860 --> 00:21:59.660
So pandas inherited a bunch of things from R in terms of the concept of the data frame.

00:21:59.660 --> 00:22:04.740
And I think compared to what I know of the R model, they refined the interface and unified

00:22:04.740 --> 00:22:05.920
it in quite a nice way.

00:22:05.920 --> 00:22:10.920
And in doing so, they've really defined a set of expectations for using libraries like this.

00:22:10.920 --> 00:22:17.660
One example is the drop NA method on a series or frame, like the idea that given a series

00:22:17.660 --> 00:22:20.780
or frame, there should be an easy way to remove missing values.

00:22:20.780 --> 00:22:22.240
We have to do this kind of thing all the time.

00:22:22.400 --> 00:22:26.440
So with that model in mind, I could start to implement those things and test them.

00:22:26.440 --> 00:22:30.100
And the performance metric that is relevant to me is my ratio to pandas.

00:22:30.100 --> 00:22:31.980
So that's what I know.

00:22:31.980 --> 00:22:35.420
Like I know for this operation, oh, I'm 0.6.

00:22:35.420 --> 00:22:36.680
I'm faster than pandas.

00:22:36.680 --> 00:22:38.980
Or for this operation, I'm 10 times slower than pandas.

00:22:38.980 --> 00:22:42.880
And I do it at this very granular level for one-to-one comparisons.

00:22:43.160 --> 00:22:45.240
That's a really interesting metric to think about.

00:22:45.240 --> 00:22:49.060
But I guess it makes sense because you're like, I would like to have this other model, this

00:22:49.060 --> 00:22:53.140
other data model, this other programming model that's data frame-like that has the safety

00:22:53.140 --> 00:22:54.080
immutability thing.

00:22:54.080 --> 00:22:55.780
It used to be pandas.

00:22:55.780 --> 00:22:58.980
Long as I don't wreck the performance too much.

00:22:58.980 --> 00:23:01.280
And if there's a benefit, then hooray.

00:23:01.280 --> 00:23:02.560
Like, we're good.

00:23:02.560 --> 00:23:02.800
Yep.

00:23:02.800 --> 00:23:03.500
That's exactly right.

00:23:03.580 --> 00:23:05.020
Yeah, that's a cool way to think about it.

00:23:05.020 --> 00:23:10.600
So let's talk about how static frame deviates from pandas.

00:23:10.600 --> 00:23:16.440
You know, so the overall idea is this immutable data grow to the right sort of story.

00:23:16.440 --> 00:23:18.300
But there's a lot of details here.

00:23:18.300 --> 00:23:19.700
Do you want to maybe talk us through them?

00:23:19.700 --> 00:23:25.200
The biggest insight is, I mean, one of the biggest changes really is the underlying numpy

00:23:25.200 --> 00:23:26.580
arrays are made immutable.

00:23:26.580 --> 00:23:30.860
This was one of the key observations that led me to start, you know, developing this and

00:23:30.860 --> 00:23:35.660
realize I didn't have to write this thing in C or C++ myself in that I found there's a

00:23:35.660 --> 00:23:36.420
flag on the num.

00:23:36.420 --> 00:23:38.400
Each numpy array has a flag attribute.

00:23:38.400 --> 00:23:40.440
And on that flag attribute are a number of properties.

00:23:40.440 --> 00:23:42.100
One of them is writable.

00:23:42.100 --> 00:23:43.480
And it's a Boolean.

00:23:43.480 --> 00:23:44.540
And you can flip it.

00:23:44.540 --> 00:23:49.880
And in doing so, the numpy, if you try to assign values into the numpy array, it gives you

00:23:49.880 --> 00:23:50.260
an exception.

00:23:50.260 --> 00:23:55.060
And numpy arrays, of course, are already fixed in size and shape.

00:23:55.060 --> 00:23:59.300
They are, numpy arrays out of the box are mutable in terms of the values contained within

00:23:59.300 --> 00:24:00.120
that size and shape.

00:24:00.520 --> 00:24:02.640
But when I found this, I was amazed.

00:24:02.640 --> 00:24:04.400
I was like, oh my God, this is what I've been looking for.

00:24:04.400 --> 00:24:10.900
So with that insight, I began writing the core piece of the library, which is the internal

00:24:10.900 --> 00:24:18.640
component called the type blocks, which manages the heterogeneous typed arrays and exposes a

00:24:18.640 --> 00:24:21.920
unified interface to external clients.

00:24:22.320 --> 00:24:28.540
So that first piece really of making all the internal arrays immutable and what I describe

00:24:28.540 --> 00:24:30.780
as fully managing the array.

00:24:30.780 --> 00:24:37.100
That is, if you create a static frame object with a numpy array, if that array happens to

00:24:37.100 --> 00:24:40.520
be immutable, I can take it and I can use it and I don't have to make a copy.

00:24:40.760 --> 00:24:47.240
But if static frame frame is given a mutable array, I make a copy and I make that copy immutable.

00:24:47.240 --> 00:24:48.660
And from there on, we're safe.

00:24:48.660 --> 00:24:49.780
Yeah, that's really cool.

00:24:49.780 --> 00:24:52.900
So obviously, if you're given immutable data, problem solved, right?

00:24:53.280 --> 00:24:59.980
But if you're not, then you want to take ownership of that data, take it inside of your library

00:24:59.980 --> 00:25:01.000
and say, yeah, you gave me this.

00:25:01.000 --> 00:25:01.660
I've read it.

00:25:01.660 --> 00:25:04.040
Now we have a safe version of it.

00:25:04.040 --> 00:25:08.800
It's really cool that you were able to just leverage that built-in feature of numpy because

00:25:08.800 --> 00:25:11.980
that meant that whole layer down there.

00:25:11.980 --> 00:25:16.140
You could just build on what numpy is doing and not have to go, we're starting from scratch

00:25:16.140 --> 00:25:17.320
with nuts and bolts, right?

00:25:17.520 --> 00:25:18.120
Yeah, exactly.

00:25:18.120 --> 00:25:24.740
And I'm still quite curious why it's there because it's not really advertised anywhere

00:25:24.740 --> 00:25:25.940
in the numpy docs.

00:25:25.940 --> 00:25:30.100
I don't see information as to suggestions of using this or whatnot.

00:25:30.100 --> 00:25:33.560
I found little bits of discussion here and there where I've seen other evidence of people

00:25:33.560 --> 00:25:34.080
using it.

00:25:34.080 --> 00:25:39.400
It is certainly documented in the flags as part of the flags for an array, but I'm actually

00:25:39.400 --> 00:25:44.340
eager to find out more information of how it got there and how the numpy developers imagined

00:25:44.340 --> 00:25:45.040
it would be used.

00:25:45.300 --> 00:25:48.160
Yeah, maybe if someone's listening, they know they could come and put a comment on

00:25:48.160 --> 00:25:48.480
the show.

00:25:48.480 --> 00:25:49.640
Yeah, that'd be great.

00:25:49.640 --> 00:25:50.080
On the show page.

00:25:50.080 --> 00:25:50.520
That'd be cool.

00:25:50.520 --> 00:25:51.600
We'll all learn from it.

00:25:51.600 --> 00:25:52.180
Yeah.

00:25:52.180 --> 00:25:53.480
Yeah, great.

00:25:53.480 --> 00:26:01.620
So how much having it based on numpy was it, I guess, able to more or less stay the same

00:26:01.620 --> 00:26:02.720
as before?

00:26:02.720 --> 00:26:05.140
Like, did it make moving from pandas a lot easier?

00:26:05.140 --> 00:26:06.000
Yeah, that's right.

00:26:06.000 --> 00:26:12.280
Because, of course, pandas, at least in its present state, all data is stored in numpy arrays.

00:26:12.480 --> 00:26:17.500
So basic expectations about how that data would work are the same.

00:26:17.500 --> 00:26:20.880
Our goal, though, with static frame was to be closer to numpy.

00:26:20.880 --> 00:26:26.580
And what that means is that every time that we do a calculation, like produce a standard

00:26:26.580 --> 00:26:31.280
deviation or a mean or something else, we use numpy operations.

00:26:31.280 --> 00:26:36.140
I feel like pandas is a bit ambivalent about this, and they have probably reasons probably

00:26:36.140 --> 00:26:37.560
for performance for doing this.

00:26:37.560 --> 00:26:45.700
But sometimes if you call the STD method on a series, you're not actually executing numpy

00:26:45.700 --> 00:26:48.320
or you're executing numpy in an unexpected way.

00:26:48.320 --> 00:26:54.000
I have a lot of respect for numpy's stability and over the years, over their versions.

00:26:54.000 --> 00:26:58.900
And I trust numpy in terms of their approaches to doing these calculations, their defaults,

00:26:58.900 --> 00:26:59.160
et cetera.

00:26:59.260 --> 00:27:00.360
And I don't want to make those decisions.

00:27:00.360 --> 00:27:05.540
So I'm happy to rely on numpy entirely for those sorts of calculations.

00:27:05.540 --> 00:27:09.920
And then the numpy type system is something also that pandas has sort of struggled against

00:27:09.920 --> 00:27:14.120
or is ambivalent about or actually increasingly seem to want to get away from.

00:27:14.120 --> 00:27:21.340
Rather than try to create or my own type system or augment numpy's types, I took the efficient

00:27:21.340 --> 00:27:26.080
approach for the resources for the project, which is just, OK, we'll just use numpy types.

00:27:26.080 --> 00:27:32.700
Which means one very clear way this shows up is that if you create a series out of two

00:27:32.700 --> 00:27:39.060
character codes, like FX currency codes, three character currency codes, you will get a series

00:27:39.060 --> 00:27:44.340
of fixed offset Unicode, three Unicode characters, which is what numpy does by default.

00:27:44.340 --> 00:27:47.480
So whereas pandas will convert that into an object type.

00:27:47.480 --> 00:27:54.140
So I just let numpy use its types pretty much as it would naturally do and avoid getting involved

00:27:54.140 --> 00:27:54.480
in that.

00:27:54.740 --> 00:27:55.220
Yeah.

00:27:55.220 --> 00:27:55.620
Yeah.

00:27:55.620 --> 00:28:00.880
And of course, you carry over all the validation and testing to make sure that all the calculations

00:28:00.880 --> 00:28:02.520
were done as accurately as possible.

00:28:02.520 --> 00:28:04.220
And that's quite a matter as well.

00:28:04.220 --> 00:28:04.720
Yeah, that's right.

00:28:04.720 --> 00:28:05.300
So yeah.

00:28:05.300 --> 00:28:11.000
So some other things that look like differences for static frame relative to pandas is one

00:28:11.000 --> 00:28:12.440
is around unique indices.

00:28:12.440 --> 00:28:13.000
Yeah.

00:28:13.000 --> 00:28:18.380
So this so there was, you know, there's a couple of things that we as a team would constantly

00:28:18.380 --> 00:28:21.020
be frustrated with in terms of pandas.

00:28:21.020 --> 00:28:26.280
And one really obvious one is this ambivalence about whether an index should be unique or not.

00:28:26.280 --> 00:28:30.440
This is when I think of an index, and maybe most people think of an index, they think of

00:28:30.440 --> 00:28:35.060
it as a mapping similar to a Python dictionary where keys have to be unique.

00:28:35.160 --> 00:28:37.320
And very often, that's how people use indices.

00:28:37.320 --> 00:28:40.200
But in pandas, indices don't have to be unique.

00:28:40.200 --> 00:28:48.120
And we would constantly be surprised when we found that a column in a table was set as the

00:28:48.120 --> 00:28:48.540
index.

00:28:48.540 --> 00:28:53.100
And without us realizing it, those values in that column were not unique.

00:28:53.240 --> 00:28:55.720
And we ended up with a non unique index.

00:28:55.720 --> 00:29:02.120
And if you try to select row from a non unique index using an LLC call, where you expect to

00:29:02.120 --> 00:29:07.440
get a series representing a row, now suddenly you get a data frame representing two rows.

00:29:07.440 --> 00:29:09.380
And that's very confusing and surprising.

00:29:09.380 --> 00:29:16.700
Pandas has an option to enforce uniqueness when you create an index from a column with a amusingly

00:29:16.700 --> 00:29:18.580
named parameter called verify integrity.

00:29:18.580 --> 00:29:26.660
And verify integrity is by default set to false on pandas set index operation, which I understand

00:29:26.660 --> 00:29:30.560
the desire to be accommodating, which I think is the motive here.

00:29:30.560 --> 00:29:33.180
But I do not want to be accommodating.

00:29:33.180 --> 00:29:36.840
I want to say that an index is a unique collection.

00:29:36.840 --> 00:29:41.820
And if you try to create an index that of a non unique collection, you'll get an error.

00:29:41.820 --> 00:29:46.320
Right, you should get zero or one things, not zero one or other numbers.

00:29:46.320 --> 00:29:47.380
Yeah, interesting.

00:29:47.780 --> 00:29:49.420
Another one is around dot access.

00:29:49.420 --> 00:29:55.100
So basically, Dunder get item mapping over to like pulling items out of the by by index.

00:29:55.100 --> 00:29:59.020
Yeah, and I think this was motivated by the ancestry of R.

00:29:59.020 --> 00:30:06.500
So in the R language, the not exactly certain, but I believe that our data frame library exposed

00:30:06.500 --> 00:30:09.960
columns through dot attribute like lookup.

00:30:09.960 --> 00:30:15.320
And I suspected that in early versions of pandas, there was a big pull to try to move our people

00:30:15.320 --> 00:30:20.060
over to Python and having that similar syntax, I assume was was desirable.

00:30:20.060 --> 00:30:24.440
But of course, there's other attributes other than columns on the data frame object.

00:30:24.440 --> 00:30:30.200
And so inevitably, there's some sort of naming collision that's going to come up with getting

00:30:30.200 --> 00:30:31.780
columns from dot attribute.

00:30:31.960 --> 00:30:36.100
So with static frame, we simply say the only way to get a column is by using the get item

00:30:36.100 --> 00:30:36.420
syntax.

00:30:36.420 --> 00:30:38.520
And there's no dot access.

00:30:38.520 --> 00:30:43.280
And that's sort of the general theme of trying to have there be only one and one way to do

00:30:43.280 --> 00:30:43.580
things.

00:30:43.840 --> 00:30:46.940
And in terms of getting a column, we say, okay, you use the get item syntax.

00:30:46.940 --> 00:30:49.000
Yeah, that's, I think that makes sense.

00:30:49.000 --> 00:30:53.720
You know, you want to be like one of the overriding themes, it sounds like a static frame is sort

00:30:53.720 --> 00:30:55.580
of safety predictability.

00:30:55.580 --> 00:30:56.660
Right?

00:30:56.660 --> 00:30:58.580
Not like, oh, we asked for the load.

00:30:58.580 --> 00:31:00.520
And that's not the load on the system column.

00:31:00.520 --> 00:31:04.460
That's the load data method or some weird thing that happens, right?

00:31:04.460 --> 00:31:05.500
When you interact with it that way.

00:31:05.580 --> 00:31:06.180
Yeah, exactly.

00:31:06.180 --> 00:31:10.480
And, you know, of course, it's a huge benefit to have to have pandas.

00:31:10.480 --> 00:31:15.640
And, you know, there's actually a family of data frame like interfaces around these days.

00:31:15.640 --> 00:31:20.700
Not only is there the pandas data frame, but I think some other library, I think of Xarray.

00:31:20.700 --> 00:31:23.400
Xarray has kind of a pandas like thing.

00:31:23.400 --> 00:31:25.300
And there's a few other libraries out there.

00:31:25.300 --> 00:31:30.120
So it's just a huge benefit to be able to look at, you know, the hard work of all of these

00:31:30.120 --> 00:31:33.980
contributors over the years, be in the luxurious position of picking and choosing.

00:31:34.900 --> 00:31:40.040
And so, you know, it's a great debt we have to those other packages and pandas in particular

00:31:40.040 --> 00:31:44.240
to be able to look at those libraries and see, okay, I see why they made all those choices.

00:31:44.240 --> 00:31:49.520
But we can consolidate all that into one thing and remove a bunch of ambiguity, remove a bunch

00:31:49.520 --> 00:31:50.460
of opportunities for error.

00:31:50.460 --> 00:31:50.740
Yeah.

00:31:50.740 --> 00:31:57.280
Does the growth of Python in the data science data exploration space and the popularity of

00:31:57.280 --> 00:31:59.720
pandas make this an easier sell at your company?

00:31:59.720 --> 00:32:04.880
Like, do you feel like you don't have to cheerlead and like make the case for Python so much?

00:32:04.880 --> 00:32:07.960
When you go and talk to people, the management or whatever, say, yeah, we're building it this

00:32:07.960 --> 00:32:08.120
way.

00:32:08.120 --> 00:32:08.380
Yeah.

00:32:08.380 --> 00:32:13.700
Well, in terms of Python in general, the growth and popularity of Python, as I'm sure all of

00:32:13.700 --> 00:32:18.540
your listeners know, has been extraordinary in the last five, 10 years.

00:32:18.700 --> 00:32:25.500
And the role of Python in data science is probably largely due to pandas.

00:32:25.500 --> 00:32:31.360
And I would even go further to say specifically pandas read CSV, which is just extraordinarily

00:32:31.360 --> 00:32:37.140
fast, blows NumPy, blows everybody else out of the water and is such an awesome thing that

00:32:37.140 --> 00:32:40.300
I think it's the gateway into Python for data science.

00:32:40.440 --> 00:32:45.380
Your question specifically about using Python within our firm, it's been a gradual move.

00:32:45.380 --> 00:32:50.540
My team was the first to use Python, but more and more, nearly every other area of the firm

00:32:50.540 --> 00:32:53.660
that's doing something with software engineering is using Python.

00:32:53.660 --> 00:32:57.540
And of course, everybody starts with pandas because that's what you see.

00:32:57.540 --> 00:32:58.480
Because it's a load CSV.

00:32:58.480 --> 00:32:58.920
Yeah.

00:32:58.920 --> 00:32:59.920
Yeah.

00:33:00.180 --> 00:33:04.520
And the idea with Static Frame was like, well, that's great for data exploration.

00:33:04.520 --> 00:33:08.460
But if you're going to build something that you want to last and you want to reduce opportunities

00:33:08.460 --> 00:33:13.140
for error, take what you know from that library and try it out with this thing and see what

00:33:13.140 --> 00:33:13.440
you can do.

00:33:13.440 --> 00:33:13.860
That's cool.

00:33:13.860 --> 00:33:16.900
You said it was basically becoming increasingly popular.

00:33:16.900 --> 00:33:19.360
What technologies was it displacing?

00:33:19.360 --> 00:33:21.380
Like what else were you using to the extent you can say?

00:33:21.380 --> 00:33:21.720
Sure.

00:33:21.720 --> 00:33:21.920
Yeah.

00:33:21.920 --> 00:33:24.060
I mean, within our firm, we were using SAS.

00:33:24.060 --> 00:33:25.240
We were using R.

00:33:25.560 --> 00:33:29.960
And those were the primary two languages, which are still quite common in finance firms

00:33:29.960 --> 00:33:30.660
and the like.

00:33:30.660 --> 00:33:32.880
And to a certain extent, people still use those.

00:33:32.880 --> 00:33:40.300
But you can see the effort in the Python community, both SciPy, Pandas, NumPy, Matplotlib, many

00:33:40.300 --> 00:33:45.940
others moving in the last five, 10 years to provide all of that functionality that R had or almost

00:33:45.940 --> 00:33:47.720
all of it and many other platforms.

00:33:47.720 --> 00:33:49.880
So it's quite easy transition.

00:33:49.880 --> 00:33:53.980
Well, not easy, but it's a directed transition from those other languages.

00:33:54.260 --> 00:33:58.520
Yeah, it's definitely it's not like going from that to C or something crazy.

00:33:58.520 --> 00:33:59.160
Yeah, for sure.

00:33:59.160 --> 00:34:03.280
So another difference has to do around iterating static frame, right?

00:34:03.280 --> 00:34:05.680
And Pandas, when you iterate, you get the values.

00:34:05.680 --> 00:34:08.420
And here, it's more dictionary-like, right?

00:34:08.420 --> 00:34:08.940
Oh, OK.

00:34:08.940 --> 00:34:09.240
Yes.

00:34:09.240 --> 00:34:11.560
There's two elements to sort of the iteration thing.

00:34:11.560 --> 00:34:15.500
The first has to do with the static frame series.

00:34:15.500 --> 00:34:21.200
So the frame and the series and both Pandas and static frame are dictionary-like containers.

00:34:21.200 --> 00:34:28.520
Both static frame and Pandas define a keys method, define an items method that work in a way that

00:34:28.520 --> 00:34:31.000
we know well from Python dictionaries.

00:34:31.000 --> 00:34:34.240
With static frame, the difference, though, has to do with the series.

00:34:34.500 --> 00:34:42.240
When you iterate a Pandas series, it iterates over the values, which makes some sense if

00:34:42.240 --> 00:34:44.260
you think of it as a wrapper around a NumPy array.

00:34:44.260 --> 00:34:51.700
But if you call dot items on a Pandas series, you're going to get pairs of the index, the key,

00:34:51.700 --> 00:34:52.440
and the value.

00:34:52.440 --> 00:34:58.100
So again, in this effort to try to be consistent, if you iterate a static frame series, you are

00:34:58.100 --> 00:35:01.400
going to iterate over the keys, just like you would with a Python dictionary.

00:35:01.640 --> 00:35:03.540
So you actually get the index values.

00:35:03.540 --> 00:35:06.920
And if you want to get the values, you have to use the dot values attribute.

00:35:06.920 --> 00:35:07.280
Right.

00:35:07.280 --> 00:35:08.960
That's one difference in terms of iteration.

00:35:08.960 --> 00:35:15.620
The other is, while that dictionary-like interface, we try to be really consistent there, the other

00:35:15.620 --> 00:35:22.360
place is recognizing that Pandas has a number of different approaches to iterating over columns

00:35:22.360 --> 00:35:27.600
or rows in a frame and function application on those iterations.

00:35:27.600 --> 00:35:31.480
So Pandas has an apply function, and it has various iteration functions.

00:35:31.480 --> 00:35:33.600
Like iter rows or iter tuples.

00:35:33.600 --> 00:35:36.180
And I saw an opportunity to unify all of those.

00:35:36.180 --> 00:35:41.140
So the series and the frame all have different families of iterators.

00:35:41.140 --> 00:35:47.060
And all of those iterators return objects that themselves have function application methods

00:35:47.060 --> 00:35:47.560
on them.

00:35:47.560 --> 00:35:52.300
So the same tool you use for iterating exposes an opportunity to do function application.

00:35:52.300 --> 00:35:57.360
And that descends from the old library that we use, where function application across the

00:35:57.360 --> 00:35:59.020
table was a really common move.

00:35:59.020 --> 00:36:04.840
And so making that sort of a first-class element in the library was really important to us.

00:36:04.840 --> 00:36:05.080
Yeah.

00:36:05.080 --> 00:36:06.360
It sounds great.

00:36:06.820 --> 00:36:08.400
Also, you talked about the sorting.

00:36:08.400 --> 00:36:12.680
The default sorting is stable in static frame.

00:36:12.680 --> 00:36:13.900
What's the story there?

00:36:13.900 --> 00:36:16.860
This is very simple because, fortunately, NumPy did all the work here.

00:36:16.860 --> 00:36:22.440
NumPy's sort method provides a number of options for which sorting algorithm to use.

00:36:22.440 --> 00:36:32.320
And again, in the spirit of safety and repeatability and stability, the default sort method for static

00:36:32.320 --> 00:36:36.660
frame is set to Panda's merge sort, which is indeed stable.

00:36:36.660 --> 00:36:40.540
The default for Panda's sort is you can switch it to be merge sort.

00:36:40.540 --> 00:36:42.920
But by default, I forget exactly what it is.

00:36:42.920 --> 00:36:45.040
But it is not a stable sort.

00:36:45.040 --> 00:36:46.540
Like quicksort or something like that.

00:36:46.540 --> 00:36:47.060
Yeah.

00:36:47.060 --> 00:36:49.040
I believe the default is quicksort.

00:36:49.360 --> 00:36:52.480
Now, why they chose quicksort, I don't know if there was any reasoning behind it.

00:36:52.480 --> 00:36:54.440
Maybe quicksort is faster in certain cases.

00:36:54.440 --> 00:36:56.520
But merge sort is reasonably fast.

00:36:56.520 --> 00:37:02.780
And if I can make a choice to ensure that the sort is stable to the order entering the

00:37:02.780 --> 00:37:04.360
sort, that seems like a benefit to me.

00:37:04.360 --> 00:37:04.580
Yeah.

00:37:04.580 --> 00:37:09.160
It comes back to this predictability, safety, overriding theme, right?

00:37:09.160 --> 00:37:09.580
Exactly.

00:37:09.580 --> 00:37:09.960
Yeah.

00:37:09.960 --> 00:37:19.040
So I guess maybe another area is how it's tied to the NumPy defaults for calculations.

00:37:19.040 --> 00:37:20.440
And things like that.

00:37:20.440 --> 00:37:20.680
Yeah.

00:37:20.680 --> 00:37:24.960
So that gets back to the spirit of, you know, being close to NumPy.

00:37:24.960 --> 00:37:31.420
And I have an example of this where, you know, you take the standard deviation of three values

00:37:31.420 --> 00:37:35.420
without any arguments in with a Panda's series.

00:37:35.420 --> 00:37:39.860
And you get a different value if you do the same thing with a NumPy array.

00:37:40.000 --> 00:37:43.900
If you use NumPy's STD function, you get a different value.

00:37:43.900 --> 00:37:45.660
And that's very confusing.

00:37:45.660 --> 00:37:52.040
And it has to do with the DDOF, the delta degrees of freedom argument to the standard deviation.

00:37:52.040 --> 00:37:54.920
Now, people that have played with standard deviations are well aware of this parameter.

00:37:54.920 --> 00:37:56.880
But some people may not be.

00:37:56.880 --> 00:37:58.240
And that's quite confusing.

00:37:58.240 --> 00:38:01.120
And I just, I don't see a need for that heterogeneity.

00:38:01.120 --> 00:38:02.980
I'm fine to stick with NumPy.

00:38:03.500 --> 00:38:05.480
Yeah, that makes a lot of sense.

00:38:08.480 --> 00:38:14.920
This portion of Talk Python to Me is brought to you by StellarS, the AI-powered talent agent for top tech talent.

00:38:14.920 --> 00:38:17.920
Hate your job or feeling just kind of meh about it?

00:38:17.920 --> 00:38:22.420
StellarS will help you find a new job you'll actually be excited to go to.

00:38:22.420 --> 00:38:26.820
StellarS knows that a job is much more than just how it sounds in a job description.

00:38:26.820 --> 00:38:31.060
So they built their AI-powered talent agent to help you find the ideal job.

00:38:31.060 --> 00:38:36.080
StellarS does all the work and screening for you, scouting out the best companies and roles

00:38:36.080 --> 00:38:40.720
and introducing you to opportunities outside your network that you wouldn't have otherwise found.

00:38:40.720 --> 00:38:46.860
Combining deep AI matching with human support, StellarS pairs things down to a maximum of five opportunities

00:38:46.860 --> 00:38:53.520
that tightly match your goals, like compensation, work-life balance, working on products you're passionate about, and team chemistry.

00:38:53.520 --> 00:38:56.420
They then facilitate warm intros.

00:38:56.420 --> 00:39:00.480
And there's never any pressure, just opportunities to explore what's out there.

00:39:00.480 --> 00:39:06.260
To get started and find a job that's just right for you, visit talkpython.fm/StellarS.

00:39:06.260 --> 00:39:11.400
That's talkpython.fm/S-T-E-L-L-A-R-E-S.

00:39:11.400 --> 00:39:14.200
Or just click the link in your show notes in your podcast player.

00:39:14.200 --> 00:39:20.360
Let's see, another one is discrete functions rather than branching parameters.

00:39:20.360 --> 00:39:26.920
So like trying to, is that like breaking stuff apart so there's functions that are simpler to understand rather than taking a bunch of parameters?

00:39:26.920 --> 00:39:27.280
Yeah.

00:39:27.280 --> 00:39:34.380
We've tried to systematically design an interface that has functions that have orthogonal parameters.

00:39:34.380 --> 00:39:38.880
So I think when all of us write functions, that should be our goal.

00:39:39.000 --> 00:39:44.100
That is, the relevance of one parameter to a function shouldn't depend on another parameter.

00:39:44.100 --> 00:39:47.160
That's quite confusing and can lead to mistakes.

00:39:47.160 --> 00:39:49.940
What you get instead is more functions.

00:39:49.940 --> 00:39:51.820
But the functions are more specific.

00:39:51.820 --> 00:39:55.540
And I believe that leads to more clear code.

00:39:55.540 --> 00:39:58.220
And it also aids in refactoring, actually.

00:39:58.220 --> 00:40:02.700
One example of that that I think is nice is the set index method.

00:40:02.900 --> 00:40:12.920
So on pandas, there's a set index method that if you give it one column as the argument, it will set that one column as an index.

00:40:12.920 --> 00:40:19.740
If you give that argument a list of column names, it will give you a hierarchical index.

00:40:19.740 --> 00:40:22.840
And all you did was change your input.

00:40:22.840 --> 00:40:25.620
And now you have a very different structure coming out of this.

00:40:25.620 --> 00:40:25.840
Right.

00:40:25.840 --> 00:40:29.480
Not even necessarily keyword arguments, but you just change the type that you're passing.

00:40:29.480 --> 00:40:29.940
Right?

00:40:29.940 --> 00:40:30.820
Yes, yes, yes.

00:40:30.820 --> 00:40:40.100
So there's many places in pandas where there is this sensitive dependency to the type of an argument that results in a different output, which is very problematic.

00:40:40.100 --> 00:40:42.680
So in static frame, we have two methods.

00:40:42.680 --> 00:40:46.340
We have set index, and we have another method called set index hierarchy.

00:40:46.340 --> 00:40:51.360
And when you set index hierarchy, there you're expected to give an iterable of column.

00:40:51.360 --> 00:40:54.700
And you can't give it a single column and vice versa.

00:40:54.700 --> 00:40:58.000
So we split the functionality into two different functions.

00:40:58.420 --> 00:41:01.880
And now it's completely clear to the reader what was intended.

00:41:01.880 --> 00:41:11.120
And if later on you need to do some refactoring and you need to find all of the places where you created a hierarchical index, well, you just search for the function name.

00:41:11.120 --> 00:41:17.800
You don't have to search for the function and then probe the type of that argument to know whether or not you're getting a hierarchical index.

00:41:17.800 --> 00:41:19.380
Yeah, that's a tremendous difference.

00:41:19.840 --> 00:41:25.620
And, you know, you go to your fancy ID, you right click, you say find usages, it'll say there are six.

00:41:25.620 --> 00:41:26.220
They are here.

00:41:26.220 --> 00:41:28.260
Yeah, exactly.

00:41:28.260 --> 00:41:28.760
Exactly.

00:41:28.760 --> 00:41:33.340
That's way better than they're here, but only sometimes.

00:41:33.340 --> 00:41:35.540
Like that's a little sketchy for sure.

00:41:35.540 --> 00:41:36.000
That's right.

00:41:36.060 --> 00:41:36.280
All right.

00:41:36.280 --> 00:41:46.720
So it sounds like there's a lot of maybe familiarity if you're coming from Pandas, but there's enough difference that this is really something on its own and special.

00:41:46.720 --> 00:41:48.500
And there's good reasons to use it.

00:41:48.620 --> 00:41:56.680
One of the key things from Pandas is the, well, I mean, in Pandas, it took Pandas to figure this out too, is that there's three types of selection.

00:41:56.680 --> 00:42:06.440
When we're selecting data, there is the root get item selection, which in Pandas overwhelmingly is used for column selection, but in some rare cases can be used for row selection.

00:42:06.440 --> 00:42:08.680
That's something we changed, but I'll get back to that.

00:42:08.680 --> 00:42:21.380
So there's the get item, there's the .loc selection, which can take one argument for row selection, two arguments for row and column selection, and the iLock selection, which uses integers instead of the labels of the index.

00:42:21.380 --> 00:42:26.420
So that family of those three selectors really gives you everything you need.

00:42:26.420 --> 00:42:30.080
Now, Pandas at various times had other types of selectors.

00:42:30.080 --> 00:42:35.160
There's this IX method, and there's a few other variants, but they seem to be getting rid of those.

00:42:35.740 --> 00:42:45.060
Recognizing that there's these three types of selection really is one of the fundamental things to bridge the gap for people coming from Pandas to static frame.

00:42:45.060 --> 00:42:46.720
Those are relatively the same.

00:42:46.720 --> 00:42:56.320
One of the key differences we made in line with consistency and having only one way to do things is the root get item selection interface is only a column selector.

00:42:56.320 --> 00:43:06.540
It is never a row selector, which is a shortcut you can do in Pandas, but again, it's undesirable, is not clear for readability, and is difficult for refactoring.

00:43:06.540 --> 00:43:06.860
Yeah.

00:43:06.860 --> 00:43:08.100
Interesting.

00:43:08.100 --> 00:43:09.360
Okay, cool.

00:43:09.360 --> 00:43:16.040
There's three types of selections, the root get item, the lock, and the iLock, and then we expose them in sub-interfaces, if you will.

00:43:16.440 --> 00:43:21.200
So a relevant question is, if I have an immutable data frame, how do I do assignment?

00:43:21.200 --> 00:43:23.060
Well, you don't.

00:43:23.060 --> 00:43:29.440
But Pandas and also NumPy have these really powerful ways of doing an assignment.

00:43:29.440 --> 00:43:31.940
I can do an assignment with Pandas.

00:43:31.940 --> 00:43:35.820
I can do an assignment in a lock call, in an LLC.

00:43:36.400 --> 00:43:38.380
And I can assign to an entire column.

00:43:38.380 --> 00:43:40.440
I can assign to an entire row.

00:43:40.440 --> 00:43:45.480
I can assign to a mixture of columns and rows by using the same syntax I use for selection.

00:43:45.480 --> 00:43:46.940
That's an awesome feature.

00:43:46.940 --> 00:43:54.680
I wanted to maintain that same expressive interface, but you can't do in-place mutation.

00:43:54.680 --> 00:43:55.580
So how do you do it?

00:43:55.580 --> 00:43:58.900
Well, on static frame, there's a .assign attribute.

00:43:58.900 --> 00:44:04.980
And that .assign attribute exposes a root get item, a lock, and an iLock.

00:44:05.260 --> 00:44:10.860
So under that assign attribute, you can do all of the same type of assignment moves you

00:44:10.860 --> 00:44:15.620
used to do, only you get back a new frame, and you're not mutating the old frame in place.

00:44:15.620 --> 00:44:16.440
That's a great feature.

00:44:16.440 --> 00:44:16.940
I love it.

00:44:16.940 --> 00:44:19.340
So let's talk about testing for a little bit.

00:44:19.340 --> 00:44:25.400
I saw that you have unit tests for performance tests, unit tests, things like that, which is

00:44:25.400 --> 00:44:25.840
great.

00:44:25.840 --> 00:44:31.920
One of the things that really stood out to me when I was looking at it was that you actually

00:44:31.920 --> 00:44:36.220
were using hypothesis, which is an interesting library.

00:44:36.220 --> 00:44:41.460
I covered, I had Austin Bingham on the show long ago talking about hypothesis.

00:44:41.460 --> 00:44:42.580
It's probably been three years.

00:44:42.580 --> 00:44:47.720
But you want to just tell us roughly, really high level, what that is and why you decided

00:44:47.720 --> 00:44:48.060
to use it?

00:44:48.120 --> 00:44:53.220
I saw at last, I think it was last year's PyCon presentation on, maybe it wasn't specifically

00:44:53.220 --> 00:44:56.140
hypothesis, but it was related to that.

00:44:56.140 --> 00:44:57.860
Property-based testing in general.

00:44:57.860 --> 00:44:58.240
Yeah.

00:44:58.480 --> 00:44:59.960
And I just was so impressed.

00:44:59.960 --> 00:45:05.780
I was like, oh man, all that time I spend trying to find corner cases and trying to make my

00:45:05.780 --> 00:45:14.000
unit tests have sufficient coverage can be automated for me by using a tool that you control and you

00:45:14.000 --> 00:45:20.500
shape the random generation of values to meet the expectations of finding these extreme corner

00:45:20.500 --> 00:45:20.900
cases.

00:45:21.280 --> 00:45:23.840
I took that away and was like, wow, I really want to do more of that.

00:45:23.840 --> 00:45:28.620
One of my colleagues here at Research Affiliates who does some work in Haskell set off on trying

00:45:28.620 --> 00:45:30.260
to use this a little bit more in depth.

00:45:30.260 --> 00:45:34.360
And this whole idea of property testing, in fact, comes out of Haskell.

00:45:34.360 --> 00:45:42.000
I forget the name of the library that originated it, but the whole library was published as one

00:45:42.000 --> 00:45:44.720
page on the paper that introduced the concept.

00:45:44.840 --> 00:45:49.520
It's really amusing, but the implementation of the original sort of property-based testing tool

00:45:49.520 --> 00:45:52.460
is just one page of Haskell code.

00:45:52.460 --> 00:45:57.160
But through his example, my colleagues' examples and starting to look at it, I'm like, man, this is

00:45:57.160 --> 00:46:01.260
exactly what I need for static frame because, you know, you're trying to build a general purpose

00:46:01.260 --> 00:46:01.780
library.

00:46:01.780 --> 00:46:05.980
There's no way I'm going to be able to anticipate the things that people are going to want to put

00:46:05.980 --> 00:46:07.760
into a series or a frame.

00:46:07.760 --> 00:46:11.800
There's no idea that I can anticipate all the possible values someone is going to try to put

00:46:11.800 --> 00:46:12.480
in an index.

00:46:13.040 --> 00:46:19.560
So with property-based testing, with using hypothesis, you open the door to just defining

00:46:19.560 --> 00:46:22.040
the properties that you expect to have.

00:46:22.040 --> 00:46:29.880
You know, namely that if you create an index with 20 integers, the resultant index is going

00:46:29.880 --> 00:46:31.080
to have 20 values.

00:46:31.080 --> 00:46:33.800
Well, that's true unless you've duplicated any values.

00:46:33.800 --> 00:46:37.060
Or that's true if it's not true if you duplicated values.

00:46:37.060 --> 00:46:40.440
Or it's not true if something else went wrong in reading those values.

00:46:41.040 --> 00:46:46.560
So I think of hypothesis in the context of static frame as a way of simulating my user.

00:46:46.560 --> 00:46:48.080
It's, you know, the user.

00:46:48.080 --> 00:46:51.760
It's thousands of users who are throwing everything into these containers.

00:46:51.760 --> 00:46:57.820
And hypothesis really nicely gives you a way to model that and really changes the way you

00:46:57.820 --> 00:46:58.560
think about testing.

00:46:58.560 --> 00:47:02.820
Again, my same colleague, you know, was like, you know, I enjoy testing again.

00:47:02.920 --> 00:47:08.380
I enjoy writing tests so much more when using this because it just forces you to think about

00:47:08.380 --> 00:47:09.140
it in a different way.

00:47:09.140 --> 00:47:12.640
And it's very refreshing compared to the task of writing unit tests.

00:47:12.640 --> 00:47:13.300
Yeah, it's cool.

00:47:13.300 --> 00:47:15.440
It's almost like writing a meta test, right?

00:47:15.440 --> 00:47:17.140
Instead of going like, here are the seven cases.

00:47:17.140 --> 00:47:18.540
Here's one where the value's in the middle.

00:47:18.540 --> 00:47:20.960
Here's the edge of the array I'm trying to test.

00:47:20.960 --> 00:47:22.180
Here's one that's out of the bounds.

00:47:22.300 --> 00:47:26.700
You can just go, this is the general type of stuff that goes in.

00:47:26.700 --> 00:47:29.580
These are the general types of things I want to verify.

00:47:29.580 --> 00:47:33.120
Go make that happen and vary a bunch of stuff for me, right?

00:47:33.120 --> 00:47:34.180
Yeah, yeah, that's right.

00:47:34.180 --> 00:47:35.340
Yeah, it's pretty cool.

00:47:35.340 --> 00:47:39.060
So I was really thrilled to see that you had put that in there for some of the testing stuff.

00:47:39.060 --> 00:47:39.520
It's cool.

00:47:39.520 --> 00:47:40.940
People can check it out in the GitHub repo.

00:47:41.140 --> 00:47:42.700
Yeah, I have a lot more to do there.

00:47:42.700 --> 00:47:45.020
But again, it's like you have to go into it.

00:47:45.020 --> 00:47:48.840
What's really startling about it is you really have to be in a different mindset.

00:47:48.840 --> 00:47:51.120
So you have to give yourself the time to get into mindset.

00:47:51.120 --> 00:47:52.720
There's much more I need to do with that.

00:47:52.720 --> 00:47:57.040
But it's a refreshing and pleasurable place to be in.

00:47:57.040 --> 00:47:58.500
So yeah, I highly recommend it.

00:47:58.500 --> 00:47:59.160
Yeah, I bet.

00:47:59.160 --> 00:48:00.100
It seems super cool.

00:48:00.100 --> 00:48:04.680
It definitely seems like you can't just bring your main way of thinking about testing.

00:48:04.680 --> 00:48:06.580
Like, I'm going to test this one case and see if it works.

00:48:06.580 --> 00:48:08.720
You've got to sort of step back a level.

00:48:08.720 --> 00:48:09.620
Yeah, that's exactly right.

00:48:09.620 --> 00:48:10.180
Yeah, nice.

00:48:10.900 --> 00:48:14.240
Another thing I wanted to ask you about that I didn't before when we talked about Python

00:48:14.240 --> 00:48:18.120
and finance and just we're coming up on 2020.

00:48:18.120 --> 00:48:22.880
It's the death clock for Python 2 is ticking at pythonclock.org.

00:48:22.880 --> 00:48:23.480
I think it is.

00:48:23.480 --> 00:48:25.160
It's ticking down.

00:48:25.160 --> 00:48:27.140
The time is getting short on it.

00:48:27.140 --> 00:48:28.840
What is it like?

00:48:28.840 --> 00:48:31.820
Is first of all, the static frame support for Python 3?

00:48:31.820 --> 00:48:33.920
Oh, it built entirely in Python 3.

00:48:33.920 --> 00:48:35.800
We're at 3.5 now.

00:48:35.800 --> 00:48:37.520
No support for 2.

00:48:37.520 --> 00:48:40.540
So that was a huge benefit of my predecessor here.

00:48:40.660 --> 00:48:41.440
Research Affiliates.

00:48:41.440 --> 00:48:47.960
He set out building our code base in Python 3 back in 2012 or even 2011, which some people

00:48:47.960 --> 00:48:51.180
would have said, might have said was, you know, kind of questionable choice.

00:48:51.180 --> 00:48:55.260
But at that point, we had NumPy and we had Pandas soon after that.

00:48:55.440 --> 00:48:59.780
So given that foundation of Python 3, we've been using Python 3 entirely and have never looked

00:48:59.780 --> 00:48:59.960
back.

00:48:59.960 --> 00:49:00.700
Yeah, that's super.

00:49:00.700 --> 00:49:07.840
And then what do you see that transition looking like in the finance space larger?

00:49:07.840 --> 00:49:11.840
Not necessarily just for your firm, but the other folks you interact with as well.

00:49:11.840 --> 00:49:13.220
In terms of moving to Python 3?

00:49:13.220 --> 00:49:13.620
Yeah.

00:49:13.700 --> 00:49:16.060
Like, do people just have their head in the ground and go, we're just not doing it?

00:49:16.060 --> 00:49:17.860
Are they going like, oh my gosh, here it comes.

00:49:17.860 --> 00:49:19.180
This is going to be like Y2K again.

00:49:19.180 --> 00:49:21.000
Or do they, are they ambivalent?

00:49:21.000 --> 00:49:23.660
What's the finance vibe around that?

00:49:23.660 --> 00:49:24.720
I can't speak broadly.

00:49:24.920 --> 00:49:30.360
I do know that there's a very large bank that employs a very large number of Python developers

00:49:30.360 --> 00:49:35.080
who use a lot of extensive systems built entirely in Python 2.

00:49:35.080 --> 00:49:38.280
And I don't know if they're even on 2.7 or 2.5.

00:49:38.280 --> 00:49:41.040
Yeah, I think the bank that you were talking about, I think I know.

00:49:41.040 --> 00:49:42.600
And I don't even think they're on 2.7.

00:49:42.600 --> 00:49:44.800
Yeah, I think they're stuck on 2.5 is what I heard.

00:49:44.800 --> 00:49:47.740
But it's going to be very hard, I would expect.

00:49:47.740 --> 00:49:52.820
Maybe they've built their frameworks in such a way that maybe they're okay.

00:49:52.820 --> 00:49:58.720
One of the things I've heard about that very large bank is that their Python tools, to some extent,

00:49:58.720 --> 00:50:00.140
are enforcing immutability.

00:50:00.140 --> 00:50:05.440
And for the same motivations that we have, they may have put constraints on the language

00:50:05.440 --> 00:50:09.800
in a way to help reduce risk that they can keep for a little while.

00:50:09.800 --> 00:50:13.260
But certainly, it's going to require a transition at some time, and that's going to be hard.

00:50:13.260 --> 00:50:14.020
Yeah, I agree.

00:50:14.020 --> 00:50:15.440
I guess two thoughts.

00:50:15.440 --> 00:50:23.740
One, do you feel like maybe that is a failure of leadership, engineering leadership, to say,

00:50:23.740 --> 00:50:26.820
we put ourselves in a corner, you guys, and we have to.

00:50:26.820 --> 00:50:32.560
I know it's not building features or driving the investment engine, but we have to keep moving

00:50:32.560 --> 00:50:35.480
forward if we get stuck, not just on 2.7, but on 2.5.

00:50:36.440 --> 00:50:41.180
And all these libraries, they can't use anything NumPy is doing, or Pandas, or in the future,

00:50:41.180 --> 00:50:42.620
right, as they're dropping, right?

00:50:42.620 --> 00:50:44.980
You know, Pandas already announced they're dropping Python 2 support.

00:50:44.980 --> 00:50:45.720
Right, I saw that.

00:50:45.720 --> 00:50:47.260
Yeah, it's definitely a challenge.

00:50:47.260 --> 00:50:48.620
And it's technical debt, right?

00:50:48.760 --> 00:50:55.000
It's, as, you know, my own team, we're on 3.5, and we're in the process of jumping to 3.7.

00:50:55.000 --> 00:51:01.340
You know, even that, for us, as a relatively small team with a decent but modest-sized code base,

00:51:01.340 --> 00:51:03.100
you know, it takes work and it takes time.

00:51:03.100 --> 00:51:05.520
And just as you say, it doesn't deliver immediate features.

00:51:05.520 --> 00:51:07.300
It doesn't deliver obvious benefits.

00:51:07.300 --> 00:51:08.980
It is a technical debt.

00:51:08.980 --> 00:51:14.640
And it's often, it's very difficult to prioritize that work appropriately, and also to communicate

00:51:14.640 --> 00:51:20.460
the value to upper management and others that are considering what your developers are doing.

00:51:20.460 --> 00:51:25.320
But it's, I mean, the important thing is that it's called debt for a reason.

00:51:25.320 --> 00:51:28.900
You have to pay it, or your survivors will pay it.

00:51:28.900 --> 00:51:34.660
There is no debt forgiveness in technical debt, other than abandonment.

00:51:34.660 --> 00:51:36.780
I mean, you can abandon the code and start over.

00:51:37.360 --> 00:51:39.320
There's no too big to fail, sort of, really.

00:51:39.320 --> 00:51:43.820
Yeah, so it's definitely something to pay attention to.

00:51:43.820 --> 00:51:48.840
I mean, even with Pandas versions, we've struggled to keep up with Pandas updates.

00:51:48.840 --> 00:51:51.080
We're still presently using Pandas 17.

00:51:51.080 --> 00:51:54.380
We are transitioning to Pandas 23 or 24.

00:51:54.380 --> 00:51:55.480
I think we're going to 24 now.

00:51:55.480 --> 00:51:56.900
I think 25 just came out.

00:51:56.900 --> 00:52:04.840
But even before we were on 17, we suffered and spent quite a bit of time accommodating the changes to the API

00:52:04.840 --> 00:52:08.200
and changes downstream of Pandas changes.

00:52:08.200 --> 00:52:11.380
So it's painful, but you just have to do it.

00:52:11.520 --> 00:52:13.140
Yeah, in their defense, right?

00:52:13.140 --> 00:52:14.080
That's a lot of money.

00:52:14.080 --> 00:52:20.820
If you're rewriting the code significantly, that's touching money versus just driving the website or whatever.

00:52:20.820 --> 00:52:22.240
I can understand the hesitation.

00:52:22.240 --> 00:52:23.100
I want to mess with that.

00:52:23.100 --> 00:52:26.600
But at some point, maybe it's not in 2020.

00:52:26.600 --> 00:52:27.560
Maybe it's 2025.

00:52:27.560 --> 00:52:29.480
At some point, it's going to be a problem.

00:52:29.480 --> 00:52:31.540
People are going to go, I don't want to work there.

00:52:31.540 --> 00:52:32.360
You mean really?

00:52:32.360 --> 00:52:36.420
That version from that long ago with that few library support?

00:52:36.740 --> 00:52:37.640
No, thank you, right?

00:52:37.640 --> 00:52:38.920
Like, it's going to be a problem.

00:52:38.920 --> 00:52:39.980
It's going to be like cobalt.

00:52:39.980 --> 00:52:40.560
Yeah, yeah.

00:52:40.560 --> 00:52:40.840
Cobalt.

00:52:40.840 --> 00:52:47.020
I made a joke about cobalt the other day with some of my colleagues, and I was quickly corrected that there is –

00:52:47.020 --> 00:52:49.360
apparently there still is quite a bit of cobalt in production.

00:52:49.360 --> 00:52:49.760
Yes.

00:52:49.920 --> 00:52:55.080
So I was like – I thought it was like a dinosaur, but I guess there's still a lot of cobalt in production.

00:52:55.080 --> 00:52:57.840
But you can get away with it for so long.

00:52:57.840 --> 00:53:00.660
But at a certain point, yeah, you're exactly right.

00:53:00.660 --> 00:53:03.600
It's a huge detriment to recruiting.

00:53:03.600 --> 00:53:06.200
We're a small firm located in Newport Beach.

00:53:06.200 --> 00:53:10.440
Not exactly a tech hub, although Irvine's trying a little bit.

00:53:10.440 --> 00:53:18.520
But for as long as we've been recruiting for this team, I've been – less so now, but a few years ago, I would say to people, yeah, and we're working in Python 3.

00:53:18.520 --> 00:53:19.460
And they would say, oh, really?

00:53:19.460 --> 00:53:20.600
You're working in Python 3?

00:53:20.600 --> 00:53:22.960
I'm stuck in 2.7 or 2.5.

00:53:22.960 --> 00:53:23.660
I'm so excited.

00:53:23.660 --> 00:53:24.420
That would be awesome.

00:53:24.420 --> 00:53:25.460
I am so excited.

00:53:25.900 --> 00:53:36.160
So a few years ago, that we were entirely in Python 3 was explicitly a highly desirable feature for prospective candidates to our team.

00:53:36.160 --> 00:53:40.060
A little bit less so now, but it's something that we always say up front.

00:53:40.060 --> 00:53:40.960
Yeah.

00:53:40.960 --> 00:53:43.000
Well, it's definitely a good thing.

00:53:43.000 --> 00:53:50.340
I think only less so only because other people have started to make that path, go down that path, right?

00:53:50.340 --> 00:53:51.260
Yeah, that's right.

00:53:51.440 --> 00:53:59.680
I mean, when I was in – I went to a PyCon, I believe it was in 2013, and I believe it was at Guido's keynote.

00:53:59.680 --> 00:54:01.120
Maybe it was somebody else.

00:54:01.120 --> 00:54:07.160
But the question was asked to the general assembly, you know, when there's all thousands of – however many thousands of people are in that room.

00:54:07.160 --> 00:54:11.340
And they asked a show of hands of how many people are using Python 3 in production.

00:54:11.340 --> 00:54:15.160
And me and my colleague raise our hand and look around.

00:54:15.160 --> 00:54:18.440
And there's just – I mean, it was far less than 10%.

00:54:18.440 --> 00:54:18.740
Yeah.

00:54:18.880 --> 00:54:23.720
But I think they did that exercise again at a recent PyCon, and it was – looks like it was more than half, you know?

00:54:23.720 --> 00:54:24.100
Oh, yeah.

00:54:24.100 --> 00:54:24.520
Yeah.

00:54:24.520 --> 00:54:27.880
The community is definitely moving, and, you know, it's good to see.

00:54:27.880 --> 00:54:28.860
It's great to see.

00:54:28.860 --> 00:54:29.460
It's great to see.

00:54:29.460 --> 00:54:30.640
All right.

00:54:30.640 --> 00:54:34.580
Well, I think we're getting short on time, so we're going to have to leave it there.

00:54:34.580 --> 00:54:36.520
People should definitely check out Static Frame.

00:54:36.520 --> 00:54:40.120
If Pandas is something that you're doing, maybe this will apply.

00:54:40.120 --> 00:54:48.580
I guess maybe one final question I could ask for you, Chris, is how does somebody know that they have a problem that Static Frame will solve better than Pandas is solving?

00:54:48.580 --> 00:54:49.480
Yeah.

00:54:49.480 --> 00:54:52.740
I mean, often the advice will be like, hey, yeah, use Pandas, right?

00:54:52.740 --> 00:54:54.060
Load CSV, all that kind of stuff.

00:54:54.060 --> 00:54:59.420
But, like, when would you say, actually, you should consider this because it'll solve your problem better?

00:54:59.420 --> 00:55:00.740
I would say there's a couple signs.

00:55:00.740 --> 00:55:04.020
One might be that you keep on making mistakes.

00:55:04.020 --> 00:55:20.780
You make mistakes because you reach for the wrong interface, or you get a surprising result because there's a type sensitivity to an argument, or you make a mistake because you accidentally mutated data you didn't intend to mistake, or you got a multi-index when you expected a unique index.

00:55:21.060 --> 00:55:30.380
You know, those are the kinds of things that are the telltale signs that maybe the kind of work you're doing, you know, requires a different package with a different set of constraints.

00:55:30.380 --> 00:55:30.980
Yeah, that's right.

00:55:30.980 --> 00:55:31.740
That's a great description.

00:55:31.740 --> 00:55:32.060
Thanks.

00:55:32.060 --> 00:55:32.740
All right.

00:55:32.740 --> 00:55:34.960
Now, before you get out of here, I've got the final two questions.

00:55:34.960 --> 00:55:35.480
Sure.

00:55:35.480 --> 00:55:38.960
If you're going to write some Python code or work on Static Frame, what editor do you use?

00:55:38.960 --> 00:55:45.600
I am recently moved over to VS Code, as many people may have had some apprehension about Microsoft products for some time.

00:55:46.120 --> 00:55:50.560
And now there's a Microsoft product that I use every day and really enjoy.

00:55:50.560 --> 00:55:54.680
Prior to that, I used a few different editors, but I've been really happy with VS Code.

00:55:54.680 --> 00:55:57.300
In large part, I don't really ask a lot for my IDE.

00:55:57.300 --> 00:56:02.960
I really want it to get out of my way, and I don't debug in the IDE.

00:56:02.960 --> 00:56:04.280
I don't lint in the IDE.

00:56:04.280 --> 00:56:06.040
I prefer to do those things from the command line.

00:56:06.040 --> 00:56:14.660
I just like my ID to be something close to like a Zen mode that gets everything out of the way, and I'm very aesthetically inclined,

00:56:14.840 --> 00:56:16.620
so I'm very sensitive to my colors and whatnot.

00:56:16.620 --> 00:56:27.340
So with VS Code, I was able to quickly, with a very low transition cost, get it to be visually, aesthetically, sort of ergonomically comfortable for me.

00:56:27.340 --> 00:56:30.520
And in subsequent updates, it hasn't made it worse.

00:56:30.520 --> 00:56:31.440
It's been good.

00:56:31.440 --> 00:56:33.180
So I've been very happy with VS Code.

00:56:33.180 --> 00:56:33.520
That's cool.

00:56:33.520 --> 00:56:36.320
Yeah, they're doing great stuff with that, so I definitely hear that a lot.

00:56:36.320 --> 00:56:37.240
All right.

00:56:37.240 --> 00:56:39.220
And then notable PyPI package.

00:56:39.220 --> 00:56:41.680
I'll go ahead and throw a static frame out there for you.

00:56:41.680 --> 00:56:42.980
People can pip install that, right?

00:56:42.980 --> 00:56:43.340
Yep.

00:56:43.340 --> 00:56:43.640
Yep.

00:56:43.640 --> 00:56:44.040
It's there.

00:56:44.040 --> 00:56:44.480
Ready to go.

00:56:44.620 --> 00:56:44.860
All right.

00:56:44.860 --> 00:56:47.620
Other ones that you're like, oh, I heard about this the other day.

00:56:47.620 --> 00:56:49.680
Maybe you don't know about it, but it's really cool.

00:56:49.680 --> 00:56:51.060
It solves this problem uniquely or whatever.

00:56:51.060 --> 00:56:51.920
Any come to mind?

00:56:51.920 --> 00:56:56.480
I should plug the project I worked at before I started at Research Affiliates, which is Music21.

00:56:56.480 --> 00:57:06.800
Music21 is a Python package that I co-created and founded and did sort of initial three years of work on it at MIT with a former colleague of mine there,

00:57:06.940 --> 00:57:07.940
which is a really fun tool for the other day.

00:57:07.940 --> 00:57:11.600
Which is a really fun tool for examining what we call symbolic music.

00:57:11.600 --> 00:57:15.300
So music represented as XML or music represented as MIDI files.

00:57:15.300 --> 00:57:24.400
Music21 allows you to take in these musical representations and play with them as an object model and ask questions about them.

00:57:24.500 --> 00:57:34.360
Like, for example, given all of Mozart's string quartets, how often does he use a modified pitch on the third beat or something like that?

00:57:34.360 --> 00:57:34.660
Uh-huh.

00:57:34.660 --> 00:57:35.120
Awesome.

00:57:35.240 --> 00:57:42.780
So it's a really fun toolkit if you know anything about music and you want to start experimenting with generating or analyzing musical notation.

00:57:42.780 --> 00:57:43.100
Okay.

00:57:43.100 --> 00:57:44.140
That's a great recommendation.

00:57:44.140 --> 00:57:44.840
That's very cool.

00:57:44.840 --> 00:57:45.840
All right.

00:57:45.840 --> 00:57:46.560
Final call to action.

00:57:46.560 --> 00:57:48.080
People want to get started with static frame.

00:57:48.080 --> 00:57:48.700
What do they do?

00:57:48.700 --> 00:57:50.340
I did the essential thing recently.

00:57:50.340 --> 00:57:51.840
I made a quick start guide.

00:57:51.840 --> 00:57:57.520
So I started to write API documentation, and that was kind of tough.

00:57:57.520 --> 00:58:00.260
And it's not a pleasurable read and not a good introduction.

00:58:00.260 --> 00:58:02.660
So I fairly recently wrote a little quick start guide.

00:58:02.660 --> 00:58:04.400
You can find it on GitHub in the readme.

00:58:04.400 --> 00:58:14.660
You can find it in the documentation, which is a little tutorial using data available from a JSON endpoint that will walk you through some of the key features and main differences from Pandas.

00:58:14.660 --> 00:58:18.600
And hopefully will be enough to get people excited about the package.

00:58:18.600 --> 00:58:19.280
Yeah, very cool.

00:58:19.280 --> 00:58:21.980
And you also gave a presentation at PyCon, which was recorded.

00:58:21.980 --> 00:58:23.700
I'll link to that so people can check that out.

00:58:23.700 --> 00:58:24.740
Final question.

00:58:24.740 --> 00:58:29.140
Are you looking for open source contributors, people to jump on this project, or is it kind of baked?

00:58:29.140 --> 00:58:30.060
What's the status there?

00:58:30.060 --> 00:58:30.800
Oh, absolutely.

00:58:30.800 --> 00:58:45.260
So while this tool is being used internally within my firm and its use will grow within our firm, we are absolutely looking for contributors and users and testers to give us some feedback.

00:58:45.620 --> 00:59:01.160
I've been fortunate in the development of this in that I've had my team to constantly give me feedback and tell me I'm being too nice, as they like to do, to make our interfaces discreet and precise and get a lot of feedback and support from my team.

00:59:01.240 --> 00:59:05.560
So I owe a huge debt to my team and the context of our work here to support that.

00:59:05.560 --> 00:59:06.840
But we need more users.

00:59:06.840 --> 00:59:07.680
We need more testers.

00:59:07.680 --> 00:59:08.400
We need more feedback.

00:59:08.400 --> 00:59:15.520
So at a basic level, people using the tool and giving us some feedback, they may not be ready to move it into it into their production systems.

00:59:15.520 --> 00:59:16.900
And I certainly understand that.

00:59:16.900 --> 00:59:21.100
But some good dabbling, starting to play with it would be really helpful for us in getting some feedback.

00:59:21.100 --> 00:59:25.880
And of course, if I'm pretty happy with the code itself, I would encourage those to look at the code.

00:59:25.880 --> 00:59:28.000
They see opportunities to add things and make things better.

00:59:28.000 --> 00:59:29.260
That would be fantastic as well.

00:59:29.380 --> 00:59:29.860
Yeah, super.

00:59:29.860 --> 00:59:30.800
All right.

00:59:30.800 --> 00:59:34.080
Well, thanks for giving us the whole story and history of Static Frame.

00:59:34.080 --> 00:59:35.400
It looks like a really cool project.

00:59:35.400 --> 00:59:35.740
Great.

00:59:35.740 --> 00:59:36.480
Thank you for your time.

00:59:36.480 --> 00:59:37.160
Happy to be on the show.

00:59:37.160 --> 00:59:37.480
Yep.

00:59:37.480 --> 00:59:38.080
Happy to have you.

00:59:38.080 --> 00:59:38.320
Bye.

00:59:38.320 --> 00:59:38.640
Bye-bye.

00:59:38.640 --> 00:59:42.460
This has been another episode of Talk Python to Me.

00:59:42.460 --> 00:59:47.860
Our guest on this episode was Christopher Ariza, and it's been brought to you by Linode and Stellarress.

00:59:47.860 --> 00:59:51.580
Linode is your go-to hosting for whatever you're building with Python.

00:59:51.580 --> 00:59:55.140
Get four months free at talkpython.fm/Linode.

00:59:55.140 --> 00:59:57.020
That's L-I-N-O-D-E.

00:59:57.900 --> 01:00:03.000
Find the right job for you with Stellarress, the AI-powered talent agent for the top tech talent.

01:00:03.000 --> 01:00:06.900
Visit talkpython.fm/Stellarress to get started.

01:00:06.900 --> 01:00:11.800
That's talkpython.fm/S-T-E-L-L-A-R-E-S.

01:00:11.800 --> 01:00:12.540
Stellarress.

01:00:12.540 --> 01:00:14.780
Want to level up your Python?

01:00:14.780 --> 01:00:19.640
If you're just getting started, try my Python Jumpstart by Building 10 Apps course.

01:00:19.740 --> 01:00:27.780
Or if you're looking for something more advanced, check out our new Async course that digs into all the different types of Async programming you can do in Python.

01:00:27.780 --> 01:00:32.460
And of course, if you're interested in more than one of these, be sure to check out our Everything Bundle.

01:00:32.460 --> 01:00:34.340
It's like a subscription that never expires.

01:00:34.340 --> 01:00:36.640
Be sure to subscribe to the show.

01:00:36.640 --> 01:00:39.060
Open your favorite podcatcher and search for Python.

01:00:39.060 --> 01:00:40.280
We should be right at the top.

01:00:40.280 --> 01:00:49.280
You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm.

01:00:49.280 --> 01:00:51.380
This is your host, Michael Kennedy.

01:00:51.380 --> 01:00:52.880
Thanks so much for listening.

01:00:52.880 --> 01:00:53.940
I really appreciate it.

01:00:53.940 --> 01:00:55.680
Now get out there and write some Python code.

01:00:58.280 --> 01:01:16.280
I really appreciate it.