Learn Python with Talk Python's 270 hours of courses

#204: StaticFrame, like Pandas but safer Transcript

Recorded on Thursday, Feb 7, 2019.

00:00 Michael Kennedy: Remember back in math class, when you take a test, it wasn't enough to just write down the answer. What's the limit of that infinite summation? Pi/2. Yes, but how did you get to that number? Some problems in programming are just like this. We want to keep track of the computation done and only add more steps to the results. Heck that's basically the entire premise of functional programming. On this episode, you'll met Christopher Ariza who created a project called StaticFrame. Think of it like Pandas and Numpy's, but it never changes the computation it's already performed, it just adds to them. This is Talk Python To Me, Episode 204, recorded February 7th 2019. Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem and the personalities. This is your host, Michael Kennedy, follow me on Twitter where I'm @MKennedy, keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via @talkpython. Chris welcome to Talk Python.

01:11 Christopher Ariza: Hi Michael, glad to be on the show.

01:12 Michael Kennedy: Yeah, it's great to have you on the show. You have a really interesting library that you've been working on and I'm, it's an interesting sort of data safe take on the whole Pandas API which I think is going to be a lot of fun for all the data scientists and other people working with Pandas out there.

01:28 Christopher Ariza: Great, I look forward to talking about it.

01:29 Michael Kennedy: Absolutely, but before we do, let's get started on your story, how did you get into programming in Python?

01:34 Christopher Ariza: Sure, I started programming in Python in the year 2000, I was a graduate student at NYU and I was doing a lot of work in computer music and algorithmic composition specifically. And I was looking for a way to extend my capacity you know with these very high level synthesis languages that I was using. So I decided to learn programming, and I, more depth, and I took as a graduate, I was a graduate student there, so I took a course in C programming, and I had a graduated advisor who was supposed to oversee my work at the time. And I had this great idea to build the system in C. And so I sat with this advisor and I was like, I really want to do this thing in C and he said to me well if you wanted to build a car would you start by building every screw? And I said no, and so he said use Python and I was like what's Python? He said it's this new great language, go try it out. And so I walked over to Barnes And Nobles in Astor Place in New York City, back when there were bookstores, at Barnes and Nobles.

02:34 Michael Kennedy: Yeah they used to have a great computer section, right? Uou could go and browse and see what was interesting, it was, that was a way you learned about stuff and, and not so much these days right?

02:43 Christopher Ariza: Yeah exactly.

02:44 Michael Kennedy: Yeah so you went over and got this book yeah?

02:45 Christopher Ariza: I got Learning Python actually, there was a version of Learning Python out in the year 2000 and I picked up and started learning Python and started building this system I called athenaCL. This was a tool for algorithmic composition closely tied to a synthesis language called Csound. And did a bunch of work in that, culminating in my dissertation.

03:03 Michael Kennedy: Wow, that's really cool. So your dissertation is music and algorithmic composition basically?

03:09 Christopher Ariza: That's right.

03:10 Michael Kennedy: Okay, oh that's so cool. So when you started to work on this library and you were doing the programming around it, was this to have a computer generate music or try to use the computer to understand music, what was the goal there?

03:23 Christopher Ariza: I was studying, I was getting a PhD in music composition and theory, and so I was using these synthesis languages that took text based input of the data. So Csound is a ancient, well it's not ancient but it's a very old

03:37 Michael Kennedy: In computer terms.

03:38 Christopher Ariza: Yeah well it actually comes from an ancient lineage, it comes from the very first synthesis languages that Max Matthews invented in Bell labs in the 60's.

03:48 Michael Kennedy: Wow.

03:49 Christopher Ariza: Called Music One and Music Two, up to Music Five, those happened at the labs in the 60's and 70's, and Csound is the modern version of that. And it takes basically a text input file defining your events and your parameters. And it became quite clear that you could do really cool things if you could use code to generate these text input files, and of course it's quite straightforward to do in Python, so I wasn't using Python to do synthesis, I was using Python to generate control data that I would then feed into Csound.

04:20 Michael Kennedy: Okay, how interesting. Have you seen some of the programmatic generated music that people are doing with Python lately?

04:28 Christopher Ariza: I'm not sure is there something specific you're thinking of?

04:31 Michael Kennedy: Yeah there's been a couple of presentations around, gosh I wish I could remember the library, but there's a couple of libraries that you can use to basically live in the REPL, program out songs and interactions and it's, yeah it's pretty wild so, maybe I'll find that, a link to throw into the show notes, 'cause I can't remember the name, but there's been a few really good conference presentations that basically live musical performances done by programming Python.

04:56 Christopher Ariza: I think I saw one at PyCon last year. I believe that one worked on top of the Supercollider language. So Supercollider is a synthesis language that's much more modern than Csound, that has synthesis server independent from the language, and it's possible to use other languages to control the server. So I haven't been following it too closely but I suspect that some people are using Python to control the Supercollider server which is a great idea.

05:22 Michael Kennedy: Yeah, it's definitely interesting to see and I'll throw in a video for people 'cause if you haven't seen it it's pretty creative. Alright so, today you're not doing that much music theory right? You're working in a different discipline? Tell us about what you do day to day.

05:33 Christopher Ariza: I worked in Academia for a while, at a few places and was continuing to do my work in algorithmic composition, generative music, that sort of work, but decided to look for something else and found a job at a firm called Research Affiliates, we're a finance firm, we define and build strategies for investments that we license to many parties around the world.

05:54 Michael Kennedy: That's cool is this like the, so called, algorithmic trading type of stuff?

05:57 Christopher Ariza: Essentially, our strategies are, many of our strategies are what we call passive investment vehicles, which means that there is an algorithm, a specific procedure that's used to generate the portfolio constituents and the weights, our strategies are fairly slow moving. It is not high frequency trading, it's not anything like that.

06:16 Michael Kennedy: Yeah you're not looking for a sub mill a second advantages, you're looking for just a plain algorithm to long term invest in.

06:23 Christopher Ariza: That's right.

06:24 Michael Kennedy: Alright, so if like Warren Buffer were a programmer he might be doing stuff like that.

06:28 Christopher Ariza: Yeah that's right.

06:29 Michael Kennedy: It's funny, alright, cool so. That bring us sort of full circle back to this idea of Pandas and your variation, your take on a slightly different library that is Pandas like. So Pandas also comes from the whole finance space right? Like it's very popular in data science but of course you know, it originated out of finance which is maybe one part of data science I guess right.

06:57 Christopher Ariza: Maybe? I see data science as, and there's certainly discussions and presentations on this issue, I see it as, in some ways more speculative research into data.

07:08 Michael Kennedy: I see.

07:09 Christopher Ariza: As opposed to using these tools for the systematic application of an algorithm or a procedure.

07:15 Michael Kennedy: Right so what you're thinking is more that like what you guys do day to day, it's more programming using these tools that maybe originated out of data science, but you're deploying production systems that are running and doing stuff, not so much coming up with graphs and inferring stuff with Jupyter.

07:31 Christopher Ariza: Yeah, exactly.

07:32 Michael Kennedy: Right, okay, cool.

07:33 Christopher Ariza: Lay it out at my firm we have financial researchers, that's what they do. They comb over data and study research and you know, try to make observations from the data, that is closer to data science. But by the time strategies come to us they are well defined so we are implementing the production strategy and don't really have that sort of discovery exploration need.

07:55 Michael Kennedy: Right, okay so, let's maybe start from the beginning, you told me the idea of building these strategies and you've created this library called StaticFrame, which is really interesting and we're going to talk a lot about it, but, you started with Pandas right? Like let's use Pandas and other Python libraries to solve this problem, before you decided I'm going to replace Pandas with my own library right?

08:16 Christopher Ariza: Yeah that's right.

08:17 Michael Kennedy: Can you talk about that journey?

08:18 Christopher Ariza: Sure, sure, well actually, when I started at Research Affiliates in the year 2012, Pandas was still quite young at that point, and my predecessor had created his own library to model data transformations. And basically storing data in a table and then additionally being able to add new data to that table by column addition, kind of an Excel like data model, but implemented in Python and his implementation was very straight forward it was simply a dictionary that held rows where the rows themselves were dictionaries, so kind of like a JSON representation of a table if you will. A dictionary for the rows, and then a dictionary for each row. And there of course, there was no, we weren't using Numpy's as the low, as the back end, but it was actually reasonably efficient and we still use it in some places. After about, in 2013, we started to, I spent some time looking at Pandas and started to use it because of course the underline performance in large part due to Numpy and being able to use the vector operations of Numpy gave a significant advantage over using our own table model.

09:29 Michael Kennedy: Yeah, there's such an advantage to using things like Numpy where you hand a little data after the C layer, and that layer can do all the computation. There was a really interesting analogy or observation made by Alexander Malina from the last couple episodes ago, and he was talking about Python is one of these languages that is a little bit counter intuitive like in C if you want something to go really fast, you might make it go really fast by writing and implementing all the details in C or some other language and so the more you can like kind of control that, the more precise you can be. Where Python gets faster the more high level you try to treat it. So like if you try to implement those algorithms in pure Python, they'd be slow, but if you just call a high level Numpy function, boom it's fast right? So it's like this sort of inverted understanding of like where the performance is in this language compared to others.

10:20 Christopher Ariza: Yeah, and you know I have to admit I had looked at Numpy before, I had been using Python for a long time, but the context of using it through Pandas as a wrapper to Numpy really started me thinking oh, when I want to scale a vector I just multiply by the value and I fill the whole vector and this happens amazingly fast and there's no loops and, and you begin to take on that mindset that wherever you have a loop you're doing something wrong, you know? You want all your loops to be in the Numpy layer, and it takes a bit of conceptual work to get there.

10:52 Michael Kennedy: Yeah, that's such an interesting observation. I definitely think that that's true. And you want to let it do that for you but to think oh there's a loop, where are we missing the opportunity to make this work the way Pandas and Numpy should?

11:07 Christopher Ariza: Yeah absolutely. I mean, yeah that's exactly in our team code reviews that's exactly one of the things that we, we sort of look out for, we see, you know I see some Pandas code that somebody wrote and I see a couple loops or a loop in a loop and I'm like oh, there's got to be a better way.

11:23 Michael Kennedy: Yeah absolutely there's got to be a better way. That's cool. Alright, this transition over to Numpy and Pandas, this was pretty successful right? Like you guys were able to replace that library and do more of your work in these libraries, in these packages?

11:37 Christopher Ariza: Yeah that's right, we didn't entirely replace it because the old table model worked reasonably well in a few cases but I, in implementing some new strategies, some new tools, I started working with Pandas and it's funny because although you have a bunch of utilities on Pandas, it's hard to figure out what to do with it or how to use it really I think. But I already had a precedent, the precedent I had from our old library was that you start with data in a table, you load up initial data, initial observations about companies for example, and you maybe have 40 columns on a table of 10,000 companies and that's your initial input, and then you add new data by doing operations, applying functions on those rows or previous columns and add new columns. And previous library we used was actually very much aligned to that workflow, moving to Pandas was actually quite smooth because Pandas very easily supports growing a data frame by adding columns and those column additions can easily be performed by doing operations on columns that are already on the table or doing function application to rows already in the table.

12:43 Michael Kennedy: That makes a lot of sense. We discussed this a little bit before, you talked then about how it was really important, as your data flows down the pipeline that is doing all the calculations and eventually come to a decision of investing in this not in that, or this much in that area. Do you keep a history and keep track of what's happening, right?

13:06 Christopher Ariza: Yeah exactly, that paradigm was established before we even moved to Pandas and that was a large part of the approach of my company in that and how we do at work. Our strategies are not black boxes, we don't use you know, esoteric machines learning to discover results, we use very explicit approaches that we want to be transparent and we do human quality control on everything that we release. So it's my obligation to expose as much of the internal calculations as possible, a lot of the intermediate values, grouping, labels, everything that is necessary for a human to understand the calculation, we try to expose in our final output. So the table becomes, initial data first as 20 or 40 or more columns is your initial data, and then numerous columns that are intermediate calculations, intermediate results reducing the opportunity set through screens and some other processes, and then finally getting to the actual result, which in our case is weights and constituents.

14:11 Michael Kennedy: Yeah, it sounds very inspired by like what you do in Excel or Google Sheets right? You have your data, and then you create a formula here, and then few more formulas based on that previous one, but you would never go and like replace the original data and change it with a formula in like some weird iterative way, it's always just kind of like, to the right and down.

14:33 Christopher Ariza: That would be disciplined use of Excel. Unfortunately, there's no discipline inherently in Excel so, you see all sorts of things.

14:42 Michael Kennedy: Yeah that's true that's true, but you know, I guess inspired at least by the proper, proper use so that's pretty cool. So you did all this in Pandas, now that was working very well, why create a new library like what were the pain points or if we redesign this to be Pandas like but not Pandas, what could you gain?

15:01 Christopher Ariza: Right, the initial inspiration was recognizing this workflow that we had where we would start with initial data and some columns and then add columns as we go. That initial workflow, we found that it worked really good and it was relatively safe as long as you followed this sort of grow only paradigm where the people only

15:22 Michael Kennedy: This disciplined use.

15:25 Christopher Ariza: Get bigger

15:25 Michael Kennedy: Yeah. Exactly.

15:26 Christopher Ariza: Yeah exactly! Exactly yeah, so that's the discipline that we were doing implicitly, but we started to speculate well you know, it'd be really nice if we could actually enforce this grow only paradigm and in doing so remove a lot of opportunity for error. Now by convention we would never, mid process, go and update and place a value we had already used, but we were certainly sensitive to that danger and, in particular for teaching our paradigm to new members of our team and the rest, we had the strong desire, oh it would be so great if we could sort of enforce in some way this grow only paradigm. And that led naturally to thinking well, what if the frame data itself could be immutable? More than just enforcing the grow only paradigm, what if you had immutable frames? There's places where we open up a, we use a table as a reference data set and we might bring that into the data frame. So I might bring in FX rate, it's a currency conversion rate as a series indexed by currency code and the currency conversion value, and that's a reference value that I'm using in many many many places and the opportunity for error, if any of those values gets mutated is significant. So we kept on coming back to this like it would be so great if we could freeze a series or freeze a table like we have frozen sets for example, and treated as a immutable collection.

16:51 Michael Kennedy: Yeah, and of course it completely simplifies the whole debugging and validation right? Because you no longer have to look for these weird references where somebody still has a pointer to the data frame and they call a function that changes the values or does some other odd thing with their off by a column index or something. It seems like debugging that would be really hard and of course making financial decisions on it might be really bad.

17:20 Christopher Ariza: Yeah that's right, it reduces what I often say is it reduces the opportunity for error. There's many ways that you can, things can go wrong and you can get very confusing unexpected results by mutating your inputs and your values as you go.

17:35 Michael Kennedy: Yeah, there's the safety side of things that makes perfect sense, that's probably primary. Another thing that immutable data really opens up, I don't know if this matters at all to you guys, but anytime you have immutable data you start to have incredible opportunities for parallelism, right? Like if you're sharing then you don't have to worry about oh, I got a lock on this and make sure that's not changed, you just riff on it because it's immutable, it's not going to change.

18:01 Christopher Ariza: Yeah that's a really interesting potential that I haven't really explored. The one way I have explored it with StaticFrame is that our function application iterators expose an interface to multi processing or multi threading function application to columns and rows. So I've experimented with a little bit but there are, there's definitely more opportunities to look at for that.

18:23 Michael Kennedy: Yeah, it sounds like, for sure. And you know maybe you could even mix in some Cython in there to releases the GIL for the threaded sided stories, it just seems like there's a lot of cool possibilities so like dig into that. Is performance at all something you care about? Or is it like, eh it takes two minutes and we run it once a day or once a week so it's fine?

18:41 Christopher Ariza: Yeah definitely performance is a very significant concern and as I was doing this, as I, I started working on this project in May of 2017, I started very small, it was speculation. I wasn't sure if I could do this in native Python, that was the thing is that we, for years prior, me and my team who shared these convictions and these goals, speculated on something like this. And I always thought I was going to have to implement it in C.

19:08 Michael Kennedy: It's time to build the screws and the nuts and everything.

19:10 Christopher Ariza: Exactly, exactly yeah. So I was going to have to implement this in C, and maybe, I'd done some work in C++ so, you know, I was like maybe I can implement this in C++, and build off the STL vectors and then I realized oh man, I'm reimplementing Numpy, I don't want to do that, and, it was after a PyCon, I think it was two years ago yeah, 'cause it was in May of 2017, something at the PyCon triggered for me that you know, why don't I just try it in native Python? And if I've hit bottom next I can use Cython but I should just see what I can do. And use Numpy and just use Python and I set out of that goal and I found that performance is very good. I mean I can get in, for many operations, I can do as well or better than Pandas, some operations I'm slower, some operations I'm better, significantly better. The aggregate performance is very hard to measure, it's very dependent on use cases, there's some things that are definitely slower than Pandas, but at this point it's just pure Python, pure Numpy, we haven't done anything in Cython or C extensions or Numba or anything like that.

20:13 Michael Kennedy: This portion of Talk Python To Me is brought to you by Linode. Are you looking for a hosting that's fast, simple and incredibly affordable? Well look past that bookstore and check out Linode at talkpython.fn/linode, that's L I N O D E. Plan start at just five dollars a month for a dedicated server with a Gig of RAM. They have ten data centers across the globe so no matter where you are, or where your users are, there's a data center for you. Whether you want to run a Pylon web app, host a private git server, or just a file server, you get native SSDs on all the machines, a newly upgraded 200 Gigabit network, 24/7 friendly support, even on holidays, and a seven day money back guarantee. Need a little help with your infrastructure? They even offer professional services to help you with architecture, migrations and more. Do you want a dedicated server for free for the next four months? Just visit talkpython.fm/linode. So you're getting great format out of it already, and then there's all these, you know, low hanging fruit opportunities if needed?

21:12 Christopher Ariza: Yeah that's right.

21:12 Michael Kennedy: Yeah, so you know it's interesting to talk about the performance and could I do it this way. I think people, programmers are really bad at judging what's going to be fast and what's going to be slow, you know? It's, you look at some code you're like oh this is definitely the problem. Maybe it's slower but it's, you know, sub millisecond who cares, or it's, it's actually not even that part, it's something totally different. Did you do like profiling and stuff like that to really try to dial that in or did it just work out?

21:41 Christopher Ariza: Early on I started bench marking against Pandas for certain operations and, so, I don't think of my perform, it's actually you know, it's a huge debt to Pandas that they've you know provided this great framework that does so much and really sets the foundation. Of course it's descended from R so you know, Pandas inherited a bunch of things from R and in terms of the concept of the data frame, and I think, compared to what I know of the R model, you know they refined the interface and unified it in quite a nice way. And in doing so, they really defined a set of expectations for using libraries like this. One example is the drop NA method on series or frame like the idea that given a series of frames there should be an easy way to remove missing values, we have to do this kind of thing all the time. So, with that model in mind, I can start to implement those things and test them. And the performance metric that is relevant to me is my ratio to Pandas so that's what that's what I know, like I know for this operation oh I'm, you know, .6 faster than Pandas or for this operation I'm ten times slower than Pandas and I do it at this very gradual level for one to one comparisons.

22:43 Michael Kennedy: That's a really interesting metric to think about. But I guess it makes sense because you're like I would like to have this other model, this other data model, this other programming model that's data frame like, and has the safety immutability thing, it used to be Pandas, long as I don't wreck the performance too much, and if you, there's a benefit then you know, hurray, like, we're good.

23:02 Christopher Ariza: Yep, that's exactly right.

23:03 Michael Kennedy: Yeah, yeah, that's a cool way to think about it. So let's talk about how StaticFrame deviates from Pandas. So the overall ideas, this immutable data, grow to the right, sort of, story, but there's a lot of details here. Do you want to maybe talk us through them?

23:19 Christopher Ariza: The biggest insight is, well I mean one of the biggest changes really is the underlying Numpy arrays made immutable. This was one of the key observations that led me to start developing this and realize I didn't have to write this thing in C or C++ myself in that I found there's a flag, on the, each Numpy array has a flag attribute and on that flag attribute are a number of properties, one of them is writable. And it's a boolean, and you can flip it, and in doing so, the Numpy if you try to assign values into the Numpy array, it gives you an exception. And Numpy arrays of course are already fix in size and shape, they are, Numpy arrays out of the box are mutable in terms of the values contained within that size and shape, but when I found this, I was amazed I was like oh my god, this is what I've been working for! So, with that insight, I began writing the core piece of the library which is the internal component called the type blocs which manages the heterogeneous typed arrays, and exposes a unified interface to external clients. So that first piece really of making all the internal arrays immutable, and what I describe as fully managing the array, that is, if you create a StaticFrame object with a Numpy array, if that array, a ray happens to be immutable, I can take it, and I can use it, and I don't have to make a copy. But if StaticFrame frame is given a mutable array, I make a copy, and I make that copy immutable. And from there on we're safe.

24:48 Michael Kennedy: Yeah, that's really cool. So obviously if you're given immutable data, problem solved right? But if you're not then you want to take ownership of that data, take it inside of your library and say yeah you gave me this, I've read it, now we have a safe version of it. It's really cool that you're able to just leverage that built in feature of Numpy. Because that, that meant that whole layer down there, you could just build on what Numpy's are doing, and not have to go we're starting from scratch with nuts and bolts right?

25:17 Christopher Ariza: Yeah, exactly, and I'm still quite curious why it's there because it's not really advertised anywhere in the Numpy docs, I don't see information as to, suggestions of using this or what not. I found little bits of discussion here and there where I've seen other evidence of people using it. It is certainly documented in the flags, as parts of the flags for an array, but I'm actually eager to find out more information of how it got there and how the Numpy developers imagined it would be used.

25:45 Michael Kennedy: Yeah maybe if someone's listening and they know they could put a comment on the show.

25:49 Christopher Ariza: Yeah that'd be great.

25:50 Michael Kennedy: On the show page that'd be cool, we'll all learn from that. Yeah, great, so, how much having it based on Numpy was it, I guess able to more or less stay the same as before like did it make moving from Pandas a lot easier?

26:05 Christopher Ariza: Yeah that's right, because of course Pandas at least in its present state, all data is stored in Numpy arrays so basic expectations about how that data would work are the same. Our goal though with StaticFrame was to be closer to Numpy. And what that means is that every time that we do a calculation, like produce a standard deviation or a mean or something else, we use Numpy operations. I feel like Pandas is a bit ambivalent about this and they have probably reasons, probably for performance for doing this, but sometimes if you call the STDs methods on the series you're not actually executing Numpy's or you're executing Numpy's in an unexpected way. I have a lot of respect for Numpy's stability and over the years over their versions, and I trust Numpy in terms of their approaches to doing these calculations their defaults et cetera and I don't want to make those decisions. So I'm happy to rely on Numpy entirely for those sorts of calculations, and then the Numpy type system is something also that Pandas have sort of struggled against or is ambivalent about where it actually increasingly seem to want to get away from. Rather than try to create my own type system or augment Numpy's types, I took the efficient approach for the resources for the project which is just okay, we'll just use Numpy types. Which means one very clear way this shows up is that if you create a series out of two character codes, like FX currency codes, three character currency codes, you will get a series of fix offset unicode, three unicode characters which is what Numpy does by default so, where's Pandas will convert that into an object type. So, I just let Numpy use its types pretty much as it would naturally do and avoid getting involved in that.

27:55 Michael Kennedy: Yeah, yeah, of course you carry over all the the validation and testing and make sure that all the calculations were done as accurately as possible, and that's quite a matter as well.

28:04 Christopher Ariza: Yeah that's right.

28:05 Michael Kennedy: So, yeah so, some other things that look like differences for StaticFrame, relative to Pandas, is one is run unique indices?

28:13 Christopher Ariza: Yeah, so this is, you know there's a couple of things that we, as a team, would constantly be frustrated with in terms of Pandas and the one really obvious one is this ambivalence about whether an index should be unique or not. This is, when I think of an index, and maybe most people think of an index, they think of it as a mapping, similar to a Python dictionary where keys have to be unique. And very often that's how people use indices. But in Pandas, indices don't have to be unique, and we would constantly be surprised when we found that a column in a table was set as the index and, without us realizing it, those values in that column were not unique and we ended up with a non unique index, and if you try to select row from a non unique index using an LOC call where you expect to get a series representing a row, now suddenly you get a data frame representing two rows. And that's very confusing and surprising. Pandas has an option to enforce uniqueness when you create an index from a column, with a amusingly named parameter called verify integrity. And verify integrity is by default set to false on Pandas set index operation. Which I understand the desire of being accommodating which I think is the motive here, but I do not want to be accommodating, I want to say that an index is a unique collection and if you try to create an index that, of a non unique collection, you'll get an error.

29:42 Michael Kennedy: Right you should get zero or one thing, it's not zero, one or other

29:45 Christopher Ariza: Exactly.

29:46 Michael Kennedy: Numbers. Interesting, another one is around dot access, so basically dunder get item mapping over to like, pulling items out of the by index.

29:55 Christopher Ariza: Yeah and I think this was motivated by the ancestry of R. So in the R language, I'm not exactly certain, but I believe that R data frame library exposed columns through dot attribute like lookup. And I suspect that in early versions of Pandas, there is a big pull to try to move R people over to Python and having that similar syntax I assume was desirable. But of course, there's other attributes other than columns on the data frame object, and so, inevitably, there's some sort of naming collision that's going to come up with getting columns from dot attribute. So with StaticFrame we simply say the only way to get a column is by using the get item syntax, and there's no dot access. And that's sort of the general theme of trying to have there be only one and one way to do things and, in terms of getting the columns, we say okay use the get item syntax.

30:47 Michael Kennedy: Yeah, that's, I think that makes sense you know, you want to be, like one of the overriding themes of StaticFrame is sort of safety, predictability right? Not like, oh we asked for the load and that's not the load on the system column, that's the load data method or some weird thing that happens right when you interact with it that way.

31:05 Christopher Ariza: Yeah exactly and of course, it's a huge benefit to have Pandas and you know there's actually a family of data frame like interfaces around these days. Not only is there the Pandas data frame but think, some other libraries, I think of xarray, xarray has kind of Pandas like thing, and there's a few other libraries out there so it's such a huge benefit to be able to look at the hard work of all of these contributors over the years, be in the luxurious position of picking and choosing and so you know, it's a great debt we have to those other packages, and Pandas in particular, to be able to look at those libraries and see okay I see why they made all those choices but we can consolidate all of that into one thing, and remove a bunch of ambiguity, remove a bunch of opportunities for error.

31:51 Michael Kennedy: Yeah, does the growth of Python and the data science/data exploration space and the popularity of Pandas make this an easier sell like your company? Do you feel like you don't have to cheer lead and like make the case for Python so much when you go and talk to people or the management or whatever say yeah we're building it this way?

32:08 Christopher Ariza: Yeah well in terms of Python in general, the growth and popularity of Python and I'm sure all of your listeners know has you know been extraordinary in the last five, ten years, and the role of Python in data science is probably largely due to Pandas. And I would ever go further to say specifically Pandas read CSV which is just extraordinarily fast, blows Numpy's, blows everybody else out of the water, and is such a awesome thing that I think it's the gateway into Python for data science. Your question specifically about using Python within our firm, it's been a gradual move, my team was the first to use Python but more and more, nearly every other area of the firm that's doing something with software engineering is using Python and of course everybody starts with Pandas because that's what you see.

32:57 Michael Kennedy: 'Cause it's a load CSV?

32:58 Christopher Ariza: Yeah yeah. And the idea with StaticFrame was like, well you know that's great for data exploration, but if you're going to build something that you want to last and you want to reduce opportunities for error, take what you know from that library and you know try it out with this thing and see what you can do.

33:14 Michael Kennedy: That's cool, you said it was basically becoming increasingly popular, what technology is was it displacing like what else were you using to the extent you can say?

33:21 Christopher Ariza: Sure yeah I mean within our firm we were using SAS, we were using R, and those were the primary two languages which are still quite common in finance firms and the like. And to a certain extent, people still use those but you can see the effort in the Python community both Scipy, Pandas, Numpy, many others, moving in the last five ten years to provide all of that functionality that R had, or almost all of it, and many other platforms so it's quite easy transition, well not easy, but it's a directed transition from these other languages.

33:54 Michael Kennedy: Yeah, that's definitely, it's not like going from that to C or something crazy, yeah for sure. So another difference has to do around iterating StaticFrame right? And Pandas, when you iterate you get the values, and here you, it's more dictionary like right?

34:08 Christopher Ariza: Oh okay yes, there's two elements to sort of the iteration thing. The first has to do with the StaticFrame series. So, the frame and the series in both Pandas and StaticFrame are dictionary like containers. Both StaticFrame and Pandas define a keys method, define an items method, that work in a way that we know well from Python dictionaries. With StaticFrame, the difference though, has to do with the series. When you iterate a Pandas series, it iterates over the values. Which makes some sense if you think of it as a wrapper around a Numpy array. But if you call dot items, on a Pandas series, you're going to get pairs of the index, the key, and the value. So again, this effort to try to be consistent, if you iterate a StaticFrame series, you're going to iterate over the keys. Just like you would with the Python dictionary. So you actually get the index values, and if you want to get the values, you have to use the dot values attribute.

35:07 Michael Kennedy: Right.

35:08 Christopher Ariza: That's one difference in terms of iteration. The others is, well although this dictionary like interface, you know we try to be really consistent there, the other places is recognizing that Pandas has a number of different approaches to iterating over columns, or rows in a frame, and function application on those iterations. So Pandas has an apply function and it has various iteration functions, like iter rows or iter tuples. And I saw an opportunity to unify all of those. So, the series and the frame, all have different families of iterators, and all of those iterators return objects that themselves have function application methods on them. So the same tool you use for iterating, exposes an opportunity to do function application. And that descends from the old library that we use where you know function application across the table was a really common move. And so, making that sort of a first class element in the library was really important to us.

36:04 Michael Kennedy: Yeah, it sounds great. Also, you talked about the sorting? The default sorting is stable in StaticFrame? What's the story there?

36:14 Christopher Ariza: This is very simple because fortunately Numpy did all the work here, Numpy's sort method provides a number of options for which sorting algorithm to use. And again, in the spirit of safety and repeatability and stability, the default sort method for StaticFrame is set to Pandas merge sort, which is indeed stable. The default for Pandas sort is, you can switch it to be merge sort, but by default, I forget exactly what it is, but it is not a stable sort.

36:45 Michael Kennedy: A quick sort or something like that.

36:46 Christopher Ariza: Yeah, it's, I believe, yeah the default is quick sort. Now why they chose quick sort, I don't know if there were any reasoning behind it, maybe quick sort is faster in certain cases, but merge sort is reasonably fast, and if I can make a choice to ensure that the sort is stable to the order entering the sort, that seems like a benefit to me.

37:04 Michael Kennedy: Yeah, comes back to this predictability, safety, overwriting theme right?

37:09 Christopher Ariza: Exactly.

37:10 Michael Kennedy: Yeah, so, I guess maybe, another area is how it it's high, the Numpy default for calculations? And things like that?

37:20 Christopher Ariza: Yeah, so that comes back to the spirit of being close to Numpy and I have an example of this where you know, you take the standard deviation of three values, without any arguments, in, with a Pandas series, and you get a different value if you do the same thing with a Numpy array. If you use Numpy's STD function, you get a different value, and that's very confusing, and it has to do with the DDOF, the Delta Degrees Of Freedom argument to the standard deviation. Now people that have played with center deviation are well aware of this parameter, but some people may not be and that's quite confusing. And I just, I don't see a need for that heterogeneity, I'm fine to stick with Numpy.

38:04 Michael Kennedy: Yeah that makes a lot of sense. This portion of Talk Python To Me is brought to you by Stellares, the AI powered talent agent for top tech talent. Hate your job or feeling just kind of meh about it? Stellares will help you find a new job you'll actually be excited to go to. Stellares knows that a job is much more than just how it sounds in a job description. So they built their AI powered talent agent to help you find that ideal job. Stellares does all the work and screening and for you, scouting out the best companies and roles, and introducing you to opportunities outside your network, that you wouldn't have otherwise found. Combining deep AI machine with human support, Stellares pairs things down to a maximum of five opportunities that tightly match your goals, like compensations, or life balance, working on products you're passionate about, and team chemistry. They then facilitate warm intros, and there's never any pressure, just opportunities to explore what's out there. To get started, and find a job that's just right for you, visit talkpython.fm/stellares. That's talkpython.fm/ S T E L L A R E S, or just click the link in the show notes in your podcast player. Ah let's see another one is discrete functions rather than branching parameters so like trying to, is that like breaking stuff apart so there's function that are simpler to understand rather than taking a bunch of parameters?

39:27 Christopher Ariza: Yeah, we've tried to systematically design an interface that has functions that have orthogonal parameters. So I think, with all these write functions, that should be our goal, that is the relevance of one parameter to a function shouldn't depend on another parameter, that's quite confusing, and can lead to mistakes. What you get instead is more functions, but the functions are more specific, and I believe that leads to more clear code and it also aids in refactoring actually. One example of that that I think is nice is the set index method, so on Pandas there's a set index method that if you give it one column as the argument, it will set that one column as an index. If you give that argument a list of column names, it will give you a hierarchical index. And all you did was change your input, and now you have a very different structure coming out of this.

40:25 Michael Kennedy: Right, not even necessarily keyword arguments but you've just changed the type that you're passing right.

40:29 Christopher Ariza: Yes yes yes, so there's many places in Pandas where there is a sensitive dependency to the type of an argument that results in a different output which is very problematic. So, in StaticFrame we have two methods, we have set index and we have another method called set index hierarchy, and when you set index hierarchy, there you're expected to give a number of columns and you can't give it a single column and vice versa so we've split the functionality into two different functions, and now it's completely clear to the reader what was intended. And if later on, you need to do some refactoring, and you need to find all of the places where you created a hierarchical index, well you just search for the function name, you don't have to search for the function and then probe the type of that argument to know whether or not you're getting a hierarchical index.

41:19 Michael Kennedy: Yeah, that's a tremendous difference. And you know, you go to your fancy IDE, you right click you say find usages it'll say there are six, they are here.

41:27 Christopher Ariza: Yeah exactly, exactly.

41:29 Michael Kennedy: That's way better then, they're here but only sometimes. Like that's, that's a little sketchy for sure.

41:35 Christopher Ariza: That's right.

41:36 Michael Kennedy: Alright so it sounds like there's, there's a lot of maybe familiarity if you're coming from Pandas, but there's enough difference that this is really something on its own and special, and there's good reasons to use it.

41:49 Christopher Ariza: One of the key things from Pandas is the, well I mean in Pandas, took Pandas to figure this out too, is that there's three types of selection when we're selecting data. There is the root get item selection, which in Pandas overwhelmingly is used for column selection but in some rare cases can be used for row selection, that's something we changed but I'll get back to that. So there's the get item, there's the dot loc selection, which can take one argument for a row selection, two arguments for row and column selection, and the iloc selection, which uses integers instead of the labels of the index. So that family of those three selectors really gives you everything you need. Now Pandas at various times had other types of selectors, there's this ix method, and there's a few others variants, but they seem to be getting rid of those. Recognizing that there's these three types of selection really is one of the fundamental things to bridge the gap for people coming from Pandas to StaticFrame, those are relatively the same, one of the key differences we made, in line with consistency in having only one way to do things, is the root get item selection interface is only a column selector, it is never row selector, which is a shortcut you can do in Pandas but again, it's undesirable, it's not clear for readability, and it's difficult for refactoring.

43:06 Michael Kennedy: Yeah, interesting, okay cool.

43:09 Christopher Ariza: There's three types of selection, the root get item, the loc and the iloc, and then we expose them in sub interfaces if you will. So, a relevant question is if I have an immutable data frame, how do I do assignment? Well you don't, but Pandas, and also Numpy, have these really powerful ways of doing us an assignment. I can do an assignment with Pandas, I can do an assignment in a loc call, in an loc, and I can assign to an entire column, I can assign to an entire row, I can assign to a mixture of columns and rows by using the same syntax I use for selection. That's an awesome feature. I wanted to maintain that same expressive interface but you can't do in place mutations, so how do you do it? Well on StaticFrame there's a dot assign attribute, and that dot assign attribute exposes a root get item, a loc and an iloc, so under that assign attribute, you can do all of the same type of assignment moves you used to do, only you get back a new frame, and you're not mutating the old frame in place.

44:15 Michael Kennedy: That's a great feature, I love it. So, let's talk about testing for a little bit, I saw that you have unitest for performance test, unitest things like that, which is great. One of the things that really stood out to me when I was looking at it, was that you actually were using hypothesis, which is interesting library. I had an Austin Bigum on the show long ago, talking about hypothesis, it's probably been three years, but you want to just tell us like roughly, really high level, what that is and why you decided to use it?

44:48 Christopher Ariza: I saw at last, I think it was last year's PyCon, a presentation on, it wasn't, maybe it wasn't specifically Hypothesis but it was related to that.

44:56 Michael Kennedy: A property based testing in general, something like that?

44:58 Christopher Ariza: Yeah. And I just was so impressed, I was like oh man all that time I spent trying to find corner cases and trying to make my unit tests have sufficient coverage, can be automated for me, by using a tool that you control and you shape the random generation of values, to meet the expectations of finding these extreme corner cases. I took that away and was like wow, I really want to do more of that. One of my colleagues here at Research Affiliates who does some work in Haskell, set off and trying to use this a little bit more in depth, and there's this whole idea of property testing in fact comes out of Haskell, I forget the name of the library that originated it but the whole library was published as one page on the paper that introduced the concept it's really amusing the implementation of the original, sort of property based testing tool, is just one page of Haskell code. But through his example, my colleague's examples, and started to look at like man, this is exactly what I need for StaticFrame, because you know, you're trying to build a general purpose library, there's no way I'm going to be able to anticipate the thing that people are going to want to put into a series or a frame. There's no idea that I can anticipate all the possible values someone is going to try to put in an index. So, with property based testing, with using Hypothesis, you opened a door to just defining the properties that you expect to have, namely that if you create an index with 20 entities, the result in index is going to have 20 values. Well that's true unless you've duplicated any values, or that's true if, it's not true if you've duplicated values, or it's not true if something else went wrong in reading those values. So I think of Hypothesis in the context of StaticFrame as a way of simulating my user. There's a user, it's thousands of users who are throwing everything into these containers, and Hypothesis really nicely give you a way to model that, and, really changes the way you think about testing. Again my same colleague, you know, was like, I enjoy testing, I enjoy writing test so much more when using this because it just forces you to think about it in a different way and it's very refreshing compares the task of writing unitest.

47:12 Michael Kennedy: Yeah it's cool, it's almost like writing a metatest. Right, instead of when you're like here are the seven cases, here's the one with the values in the middle, here's the edge of the array I'm trying to test the one that's out of the bound so I should like, you just go, this is the general type of stuff that goes in, these are the general types of things I want to verify, go make that happen, and vary a bunch of stuff for me, right

47:33 Christopher Ariza: Yeah, yeah, that's right.

47:34 Michael Kennedy: It's pretty cool, so I was really thrilled to see that you had put that in there for some of the testing stuff it's cool, people can check it out in the Github repo.

47:41 Christopher Ariza: Yeah I have a lot more to do there but yeah again it's like, you have to go into, what's really startling about it is you really have to be in a different mindset. So you have to give yourself the time to get into the mindset, there's much more I need to do with that, but it's a refreshing and pleasurable place to be to be in. So yeah I highly recommend it.

47:58 Michael Kennedy: Yeah, I bet, seems super cool. It definitely seems like you can't just bring your main way of thinking about testing like, I'm going to test this one case and see if it works, you've got to like, sort of step back a level.

48:09 Christopher Ariza: Yeah that's exactly right.

48:10 Michael Kennedy: Yeah, nice. Another thing I wanted to ask you about, that I didn't before when you talked about Python, and finance and just, we're coming up on 2020, it's the death clock for Python 2 is ticking, pythonclock.org I think it is. It's ticking down, the time is getting short on it. What is it, first of all, does StaticFrame support for Python 3?

48:31 Christopher Ariza: Oh it built entirely in Python 3, we're at 3.5 now, no support for 2, so that was a huge benefit of my predecessor here at Research Affiliates, he set out building our code base in Python 3, back in 2012, or even 2011 which, some people would have said, might have said was, you know, kind of questionable choice but at that point we had Numpy and we had Pandas soon after that so, given that foundation of Python 3, we'd been using Python 3 entirely and never looked back.

49:01 Michael Kennedy: Yeah that's super. And then, what do you see the, that transition looking like in the finance space? Larger, not necessarily just for your firm but other folks you interact with as well.

49:11 Christopher Ariza: In terms of moving to Python 3?

49:14 Michael Kennedy: Yeah like do people just have their head in the ground and go we're just not doing it, are they going like oh my gosh here it comes, this is going to be like Y2K again. What's the finance vibe around that?

49:24 Christopher Ariza: I can't speak broadly, I do know that there's a very large bank that employs a very large number of Python developers who use a lot of extensive systems built entirely in Python 2, and I don't know if they're even on 2.7 or 2.5.

49:39 Michael Kennedy: Yeah I think the bank that you were talking about I think I know, I don't even think they're on 2.7.

49:42 Christopher Ariza: Yeah I think they're stuck on 2.5 as well. But it's going to be very hard, I would expect. Maybe they've built their frameworks in such a way that maybe they're okay. One of the things I've heard about this very large bank is that their Python tools to some extend are enforcing immutability, and for the same motivations that we have, they may have put constraints on the language in a way to help reduce risk that they can keep for a little while. But certainly, it's going to require transition at some time and that's going to be hard.

50:13 Michael Kennedy: Yeah I agree. Guess two thoughts, one, do you feel like, maybe that is a failure of leadership, engineering leadership to say we put our stuff in the corner you guys and we have to, I know it's not building features or driving the investment engine but we have to keep moving forward if we get stuck, not just on 2.7 but on 2.5 like that, and all these libraries like, they can't use anything Numpy is doing. Or Pandas, or make the future right as they're dropping, right, you know Pandas has already announced that they're dropping Python 2 support.

50:45 Christopher Ariza: Right I saw that. Yeah it's definitely a challenge, and it's technical debt right? My own team, we're on 3.5 and we're in the process of jumping to 3.7. Even that, for us as a relatively small team with a decent but modest size code base, it takes working it takes time, and just as you say, it doesn't deliver immediate features, it doesn't deliver obvious benefits, it is a technical debt, and it's often, it's very difficult to prioritize that work appropriately, and also to communicate the value to upper management and others that are considering what you developers are doing. I mean the important thing is that it's called debt for a reason. You have to pay it or your survivors will pay it. There is no debt forgiveness in technical debt. Other than abandonment, I mean you can abandon the code and start over.

51:37 Michael Kennedy: There's no chance to fail sort of really.

51:39 Christopher Ariza: Yeah yeah, so, it's definitely something to pay attention to, I mean, even with Pandas' versions, we've struggled to keep up with Pandas updates. We're still presently using Pandas 17, we are transitioning to Pandas 23 or 24, I think we're going to 24 now, I think 25 just came out. But even before we were on 17, we suffered and spent quite a bit of time on accommodating the changes to the API and changes down stream of Pandas changes. So it's painful but you just have to do it.

52:11 Michael Kennedy: Yeah, in their defense right, that's a lot of money. And if you're writing, rewriting the code significantly that's touching money versus just, you know driving the website or whatever like, I can understand the hesitation don't want to mess with that but, but at some point, maybe it's not in 2020, maybe it's 2025, some point, it's going to be a problem. People are going to go I don't want to work there. You mean, really, that version from that long ago, with that few library support, no thank you right? Like It's going to be a problem, it's going to be like COBOL.

52:40 Christopher Ariza: Yeah yeah I made a joke about COBOL the other day with some of my colleagues and I was quickly corrected that there is apparently, there still is quite a bit of COBOL in production

52:49 Michael Kennedy: Yes.

52:51 Christopher Ariza: So I was like, I thought it was, you know like a dinosaur, but I guess there's still a lot of COBOL in production. But you can get away with it for so long, but at a certain point, yeah you know you're exactly right, it's a huge detriment to recruiting. We're a small firm located in Newport Beach, not exactly a tech hub, although Irvine's trying a little bit, but for as long as we've been recruiting for this team I've been, less so now, but a few years ago I would say to people yeah we're working on Python 3 and they would say oh really, you're working on Python 3? I'm stuck in 2.7 or 2.5. I'm so excited that would be awesome, I'm so excited, so you know, a few years ago, that we were entirely in Python 3 was explicitly a highly desirable feature for perspective candidates to our team.

53:36 Michael Kennedy: I'm sure.

53:37 Christopher Ariza: A little bit less so now, but it's something that we always say up front.

53:41 Michael Kennedy: Yeah, well it's definitely a good. I think only less so, only because other people have started to make that path, you know, go down that path right.

53:50 Christopher Ariza: Yeah that's right, I mean when I was in, I went to a PyCon I believe it was in 2013, and I believe it was at Guido's key note, maybe it was somebody else, but the question was asked to the general assembly you know, when there's all thousands of, however many thousands of people are in that room and they ask a show of hands of how many people were using Python 3 in production. Me and my colleague raise our hands and look around and there's just, I mean it was far less than ten percent. But I think they did that exercise again at a recent PyCon and it was, looks like it was more than half, you know.

54:23 Michael Kennedy: Oh yeah.

54:24 Christopher Ariza: The community is definitely moving and, you know, it's good to see.

54:28 Michael Kennedy: It's great to see, it's great to see. Alright well I think we're getting short on time, so we're going to have to leave it there. People should definitely check out StaticFrame, if Pandas is something that you're doing, maybe this will apply. I guess maybe one final question I could ask for you Chris is how does somebody know that they have a problem that StaticFrame will solve better than Pandas is solving? I mean often the advice will be like hey, you use Pandas right, load CSVs all that kind of stuff, but like when would you say, actually you should consider this because it'll solve your problem better?

54:59 Christopher Ariza: I would say there's a couple signs, one may be that you keep on making mistakes. You make mistakes because you reach for the wrong interface, or you get a surprising result because there's a type sensitivity to an argument. Or you make a mistake because you accidentally mutated data you didn't intend to mistake, or you got a multi index when you expected a unique index. You know those are the kinds of things that are the tell tell signs that maybe the kind of work you're doing, you know, requires a different package with it.

55:29 Michael Kennedy: More structure.

55:29 Christopher Ariza: Set of constraints. Yeah that's right.

55:31 Michael Kennedy: That's a great description thanks. Alright now because you get out of here, the final two questions. If you're going to write some Python code or work on StaticFrame, what editor do you use?

55:39 Christopher Ariza: I am recently moved over to VS Code as many people may have had some apprehension about Microsoft products for some time, and now there's a Microsoft product that I use every day and really enjoy. Prior to that I used a few different editors, but I've been really happy with VS Code. In a large part, I don't really ask a lot for my IDE, I really wanted to get out of my way and I don't debug in IDE, I don't lent in IDE, I prefer to do those things from the command line. I just like my IDE to be something close to like a zen mode that gets everything out of the way and I can, and I'm very aesthetically inclined so, I'm very sensitive to my colors and whatnot, so with VS Code I was able to quickly, with a very low transition cost, get it to be visually aesthetically, sort of ergonomically comfortable for me. And, in subsequent updates it hasn't made it worse, it's been good so, I've been very happy with VS Code.

56:33 Michael Kennedy: Of course yeah, they're doing great stuff for that so I definitely hear that a lot. Alright and then notable PyPI package? I'll go ahead and throw StaticFrame out there for you, people can pip install that right?

56:44 Christopher Ariza: Yep, yep, it's there ready to go.

56:45 Michael Kennedy: Other ones that you're like oh, I heard about this the other day, maybe don't know about it, but it's really cool to solve this problem uniquely or whatever? Any come to mind?

56:52 Christopher Ariza: I should plug the project I worked at before started at Research Affiliates which is Music21. Music21 is a Python package so that I co-created and founded and did sort of initial three years of work on it at MIT with a former colleague of mine there. Which is a really fun tool for examining what we call symbolic music, so music represented as XML or music represented as MIDI files. Music21 allows you to take in these musical representations and play with them as an object model and ask questions about them. Cor example given all of Mozart's strings quartets, how often does he use a modified pitch on the third beat, something like that.

57:34 Michael Kennedy: Awesome.

57:35 Christopher Ariza: So it's a really fun toolkit if you know anything about music and you want to start experimenting with generating or analysis musical notation.

57:42 Michael Kennedy: Okay, that's a great recommendation, that's very cool. Alright final call to action, people want to get started on StaticFrame, what do they do?

57:49 Christopher Ariza: I did the essential thing recently, I made a quick start guide. I started to write API documentation and that was kind of tough and it's not a pleasurable read and not a good introductions. I fairly recently wrote a little quick start guide, you can find it on Github, in the read me, you can find it in the documentation, which is a little tutorial using data available from a JSON end point, that will walk you through some of the key features and main differences from Pandas. And hopefully will be enough to get people excited about the package.

58:18 Michael Kennedy: Yeah, very cool, and you also gave a presentation at PyCon which was recorded, I'll link to that so people can check that out. Final question, are you looking for open source contributors people to jump on this project or is it kind of baked, what's the status there?

58:30 Christopher Ariza: Oh absolutely, so while this tool is being used internally within my firm, and its use will grow within our firm, we are absolutely looking for contributors and users and testers to give us some feedback. I've been fortunate in the development of this, in that I've had my team to constantly give me feedback and tell me I'm being too nice, as they like to do, to make our interfaces discrete and precise, and get a lot of feedback and support from my team so I owe a huge debt to my team and the context of our work here to support that, but we need more users, we need more testers, we need more feedback. So at a basic level, people using the tool and giving us some feedback, they may not be ready to move it into their production systems and I certainly understand that but some good dabbling, starting to play with it would be really helpful for us and getting some feedback, and of course if, I'm pretty happy with the code itself. I would encourage those to look at the code and see opportunities to add things and make things better, that would be fantastic as well.

59:29 Michael Kennedy: Yeah, super. Alright well, thanks for giving us the whole story and history of StaticFrame, it looks like a really cool project.

59:35 Christopher Ariza: Great, thank you for your time, happy to be on the show.

59:37 Michael Kennedy: Yep, happy to have you, bye.

59:38 Christopher Ariza: Bye bye.

59:39 Michael Kennedy: This has been another episode of Talk Python To Me, our guest on this episode was Christopher Ariza, and it's been brought to you by Linode and Stellares. Linode is your go to hosting for whatever you're building with Python. Get four months free at talkpython.fm/linode. That's L I N O D E. Find the right job for you at Stellares, the AI powered talent agent for the top tech talent. Visit talkpython.fm/stellares to get started. That's talkpython.fm/ S T E L L A R E S, Stellares. Want to level up your Python? If you're just getting started, try my Python Jumpstart by Building 10 Apps course, or if you're looking for something more advanced, check out our new Async course, that digs into all the different types of async programming you can do in Python. And of course, if you're interested in more than one of these, be sure to check out our Everything Bundle. It's like a subscription that never expires. Be sure to subscribe to the show. Open your favorite podcatcher and search for Python, we should be right at the top. You can also find the iTunes feed at /itunes, the Goole Play feed at /play, and the Direct RSS feed at /rss on talkpython.fm. This is your host Michael Kennedy, thanks so much for listening, I really appreciate it. Now get out there and write some Python code.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon