Learn Python with Talk Python's 270 hours of courses

#269: HoloViz - a suite of tools for Python visualization Transcript

Recorded on Monday, Jun 15, 2020.

00:00 toolchain for modern data science can be intimidating. How do you choose between all the data visualization libraries out there? How about creating interactive web apps from those analysis? On this episode, we dive into a project that attempts to bring that whole story together. All of his of his is a coordinated effort to make browser based data visualization in Python easier to use, easier to learn and more powerful. We have Phillip ruettiger, from Holland is here to guide us through it. This is talk Python to me, Episode 269, recorded June 15 2020.

00:44 Welcome to talk Python to me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy, follow me on Twitter, where I'm at m Kennedy. Keep up with the show and listen to past episodes at talk python.fm and follow the show on Twitter via at talk Python. This episode is sponsored by brilliant.org and data dog, please check out what they're offering during their segments. It really helps support the show. Philip, welcome to talk Python to me. Thanks so much for having me. I'm excited to have you here. Now we're going to talk about a bunch of cool libraries that are all brought together around this whole of his overall meta project, if you will, to make Yeah, exactly this umbrella to make working with data and exploring and visualizing and Python a little bit easier at night. I think that's a great project. And it looks like it's getting a lot of traction. And I'm happy to be here to chat about it.

01:34 Talk about various libraries in it.

01:36 Yeah, absolutely. Before we get to that, though, let's start with your story. How do you get into programming in Python.

01:41 So I got started with programming pretty late. And so apart from like the usual thing, like, like a Geocities website, you have some extra knowledge. Yes, I really didn't get started with actual programming until I joined a tunneling engineering course, in undergrad. So I moved from Germany to the UK to study funding, engineering and music technology, thinking I suppose better music than I really was. But I took a liking to kind of programming with some programming and C and Verilog for pretty low level stuff. And then towards the end of that project, kind of undergrad degree, I developed a simulator of like bipedal locomotion in C++, which was far more complex than I had envisioned it. But it was really exciting to me to actually get into like a big project of my own. And from there, I then joined the Masters course and start programming Python, data analysis. And we had this simulator called topographical, which did kind of cool.

02:38 You talked about working on this bipedal locomotion simulator and C++ like, forget the language, that kind of stuff is trickier than it seems like it should be right.

02:47 Oh, absolutely. Because back then I had no idea about neural networks. It was like I just kind of jumped in heard about neural networks heard about like genetic programming. So I built this huge network with way too many parameters and assumed like genetic programming to make it work. It didn't it didn't really work. The simulator worked. So it did flop around, like my goal. bipedal humanoids, but stop around a little bit, but it never actually generate actually, real emotion.

03:12 Yeah, I guess there's something to training these models correctly and getting them you know, set up right in the first place. It's not just magic that you can throw at a problem, right?

03:21 Yes. But then I decided this thing wasn't complicated enough. So that actually tried to solve the brain. So that's joined the watchers, and he graduate program, neural informatics. Yeah, hoping to actually learn about how these things actually work.

03:35 Oh, that's really cool. So is that like, trying to model the way brains and synapses work using neural networks?

03:41 Yes. So that goes pretty close to what you consider like a compositional neural network. Nowadays, well, around back then they weren't as popular. Yeah. And then but then we also have like recurrent connections was the idea was to model the human visual system, basically, you're just generally the mammalian visual system. And so starting with very little,

03:59 what kind of problems were you trying to do? What we tried to ask it to do and see if it was success for you?

04:05 It was really, the idea was that it was self organizing, right? So that you didn't have to reprogram a bunch of like, long stuff into the network, or just an organized like, like many of the convolution neural networks nowadays do. But we were trying to kind of keep it closer to the actual biology. So we had different cell types that were interacting. And those models were tremendously complex. And it was just super hard to analyze them. Like I started having these huge model outputs, like, I have, like 60 yonder page outputs of like a model run with PDFs with just images in it. And that's actually when I started developing these visualizations. Yeah, colleague of mine were like, yeah, this isn't feasible. We can't analyze these things properly. Just looking just looking through PDFs. I wake up, like, PDFs and

04:52 and then I started we started writing this Yeah,

04:54 it's like trying to watch the green in the matrix, right? Like, just try like, No, I can't see it this way. This is what we Got to look at it better.

05:00 Yes, precisely. So the idea was that we start building something like you have these huge browser spaces, right? It's like excited restraints, inhibitory strength, the model would evolve over time. And so we had these very complex Congress spaces trying to explore. And so we built this tool called all of us start digging into that, and then realizing like, What effect does this parameter actually have on the evolution of the model. So you could drag a slider and see detach the strength of this to the model does that. And so then this is how it evolves with time. And that was really breakthrough to actually start analyzing. But it also meant that I eventually started spending more time building this realization tool than was spending on my actual profit. And I found that's

05:43 one of the challenges

05:44 I found, but in the end, I found that more rewarding.

05:48 Yeah, that's the real danger, right is like, me, I started getting into programming doing research on complex dynamical systems and, and math and, and whatnot. And I after a while, I realized, you know, the part of this project that really makes me happy is when I'm not doing the math. That was a sign that I should probably be doing something else. But it's, it is really fun to build these things. I also do think it's, you know, it's a challenge of research projects and this academic stuff in general. It's, it's hard to get credit for that, right? Like, you're not they're not going to go, oh, man, that's a killer contribution to data science that you made. Here's your PhD. They're like, Where's the network? Where's this paper? Right?

06:29 Where's the publication? Yeah, exactly. is a publication. Exactly. Yeah. And actually, we did publish a paper on all of us. And that was turned out to be the only paper I published in my PhD. Breast I mean, yeah, basically, my models didn't work until the very end, two weeks before and in my thesis, the models are working results. I want to get around to

06:51 man that's down to the wire

06:52 was really down to the wire. Yeah.

06:57 Yeah, well, it's better late than never in that case. So that's how you got into programming and February to Python. I mean, obviously, it's a natural place to go Python is if you're doing neural networks, what time frame was this in? Like, what year was this?

07:13 So I joined this, it wasn't really a good program. So it was the doctoral Training Center in Edinburgh. That actually doesn't exist existing was it was part of the informatics department, but they have cold call collaboration with the maritime department. And I joined in 2010. And the first year was a master's program. And it was kind of me way longer than ship to finish my crappy program. But I think in 2015, I finally

07:41 ended in the thesis and did my defense. Yeah, cool.

07:43 I'm just wondering, you know, you started in 2010, with some of this stuff. If you started now, how much easier? Do you think it would be? Or would it be basically the same to work on is the visualization problems, the neural network problem? It just seems like that has come so far in the last five years?

07:59 Oh, absolutely. So back then I remember, like we had, obviously, you had to interface with C code of law. And so we had to use something called Sai pi weave, which most people probably don't know about anymore. But was this really awkward interface or C extensions. And nowadays, we've got things like number, we could write the kernels that we're running in, like in pure Python, just to compile it to something fairly optimized, right? Absolutely. Same with the visualization tools. There's like so many interactive visualization tools at this point. So back then, for example, hobbies is built using matplotlib. That sort of that, but the actual rendering nowadays will build on. Okay, and property. It's gotten a lot more interactive. It's one. That's really cool. There's been such a huge evolution. Yeah.

08:47 Nice. All right, well, what are you up to these days? What do you do day to day,

08:51 so it's really nice. I have kind of the freedom to switch between lots of freedom. But we kind of were joined continuing to Linux, which is now on a condo, in quantity. Actually, before I handed in my thesis, I was running out of funding and then joined a college kind of day job between writing my thesis, right. And so I joined to do consulting itself, machine learning, problems for various government, clients, corporations, and so on. But from the very beginning, we kind of had this idea of we build open source tools that would solve people's problems, and then use them in our consultant. That kind of model worked really well for us. Because the entire whole this suite of tools was built kind of as a as a spending quite a bit of open source time, on paper, or on billable time, open source life, but also kind of, for example, panel was built with funding from the US Army Corps of Engineers. So they started to pick out what works and built this new dashboarding framework. And so I freedom over basically six months to build this new tool.

09:54 Yeah, panels really cool. We'll talk about that for sure.

09:56 So yeah, I go between kind of most of my time is consulting work. But as much as possible, we try to contribute the stuff we work on during a time back to the office.

10:05 Yeah. So you mostly mostly do remote work? I would guess being?

10:09 Yes. So actually, I was an admirer of Jeff Anaconda in Austin. Yeah, I was in Edinburgh for years. And then last year, I moved here, back to Berlin, which is where I grew up. And actually, we had an office, it's kind of just open an office here. And it was I thought it would be nice to actually spend like, two to three days a week just actually going to the office seeing people have a more regular routine. Yeah, since work until 3am. And then didn't have to like before noon, but then then probably happens and go back to me.

10:37 Yeah, going to be around people,

10:40 not everyone.

10:42 Yeah, for sure. It's a bit of a bummer. I mean, for folks like us who can work remotely, and just carry on mostly gym or doing a bit of a bummer for a lot of people. It's a tragedy, right? It's a huge, huge problem. But especially I

10:57 don't know how people do.

10:59 I know, it seems really, really scary. Hopefully, we get through that soon. And we can go back to an office and who knows what people's desire to get back to work together will be like, some of these remote ideas I think are going to stick and some people are kind of like, cool. So glad that's over.

11:14 Yeah, I think so particularly. I mean, it's really not a good test for like people are talking like this has been ushered in the revolution of but it's a forced forced scenario, right. People don't have childcare. They're stuck at home. Yeah. So I don't know.

11:28 Off the top of our compiling,

11:31 I think you touched on the real challenge. It's one thing to say, well, let's all try to be remote for a while. It's another to say, let's work with your small children around you all the time. Like that is the real struggle, I think, as a parent to find the time and the focus. Right. So I think it's an unfair test. But if it's working under these scenarios, this is like the worst case scenario. So obviously,

11:54 it's also working. Right.

11:56 Exactly, exactly. It's interesting.

11:58 Also, it's kind of nice. I actually quite like I got a client meeting. stolen.

12:04 Yeah, yeah, it does humanize people a little bit. I think, you know, it's don't go too far. But like you, you know, watch the news, or you watch, like comedy shows that are still going and it's just like, yeah, everyone's at their couch, or their kitchen table, or just their little home office. And yeah, it's funny. So let's talk about the history of this project a little bit. So you, you started with hollow ww is Oh,

12:31 yes. Hello. Yes. And there's more confusing history. But we'll get into

12:41 this portion of talk Python, to me is brought to you by brilliant.org. Brilliant has digestible courses in topics from the basics of scientific thinking all the way up to high end science, like quantum computing. And while quantum computing may sound complicated, brilliant, makes complex learning uncomplicated, and fun. It's super easy to get started. And they've got so many science and math courses to choose from. I recently used brilliant to get into rocket science for an upcoming episode, and it was a blast. The interactive courses are presented in a clean and accessible way. And you could go from knowing nothing about a topic to having a deep understanding. Put your spare time to good use and hugely improve your critical thinking skills. Go to talkpython.fm/ brilliant and sign up for free. The first 200 people that use that link, get 20% off the premium subscription, that's talkpython.fm/ brilliant, or just click the link in the show notes. How did you go from like trying to create better visualizations to this larger project? I guess, you know, as a way introduction, maybe like, tell people how you got there and then like, give us a high level view of what it is.

13:46 Yeah, absolutely. So we started with all of us, which was actually built on this project called puram. So prom is kind of like, you now have data classes and price and parameter classes that are typed and so on. And there's projects like platelets, to Graham was kind of the foundation of everything built on top of that foundation, just general semantic additions to this thing is not just a tuple. But it's a range of two numbers actually represent the brain. That was kind of the initial thing which had been around before I even got into Python. And then we built all of us on top of that. And then one of our first projects that's continuing back in those days was for the UK Met Office to build extend hobbies to your graphic support brought about GPUs. Attention for all of us. And then right,

14:41 obviously focused on, you know, geographical data and maps, data and whatnot.

14:45 Yes, yeah. How about your projections for us?

14:50 But then, I mean, what we saw over and over again, as part of our consulting project was people were happies analysis and a lot of them were notebooks and then people would just share these notebooks, but really someone who doesn't know about code is kind of scared or put off by all the code in this notebook. And another way to share it. And that's kind of how we started building our dashboarding tools. Right?

15:12 So notebooks are pretty nice to show people but at least in Jupyter, as far as I know, there's not a great way to say please load this with every bit of code collapsed,

15:21 right? I mean, there's there's templates, but maybe they're kind of obscure. Not everyone is familiar with them. Just generally, like, if you just want to have everything nicely presented as a nice layout that you put together. There wasn't really.

15:35 Yeah, right. Okay. But all of that changed

15:38 a little bit. And then we just need a name for all the stuff, which we decided on private, private seemed like a good name. It wasn't taken. And then it was great name. But we had a little bit of pushback from private sounds like just Python virtualization rights. And it's kind of presumptions to his name, we think we could keep it like, you can't claim it. All. Right. And I think

16:02 that Okay, we've got, yeah,

16:03 yeah, it was a totally fair criticism, and we can talk to various community members, and we're like, okay, privacy becomes this general thing, and we're gonna find him, which has been confusing, like, obviously, having, I think this was a year and a half ago, and we kind of run with the Piper's name for a year and a half as well. And so oftentimes, when you see a blog post out there, silver pie, this is not the general resource that it's meant to be.

16:27 Yeah, I was looking at some videos on YouTube, about some like the presentations you gave and stuff. And I saw sometimes it was called hollow and sometimes private. And like, I wonder what the relationship is here. And I see historically where that comes from. Yeah,

16:39 yeah. So I think overall, it's it's it was good idea to kind of have this general resource. And absolutely, people were happy to have this listing of all the different visualization desperately leverages on there. And we'd like to have more tutorial material 2.2, and stuff like that, just it comes to General Resource and how it is now, our effort to kind of have a coordinate set of tools that work well together to have a browser based presentation dashboarding and just make that easier. Yeah,

17:08 very cool. You're basically making a bunch of choices for people, like, here's a way that you can plot stuff, here's a way that you can stream data or process large amounts of data with what doesn't fit in RAM or whatever. Yeah, but you're not forcing them down that path. Right. One of the things that was cool is like, Okay, if you need more controller, you want to do something different, you can just do this yourself, if you want to do less work and accept the defaults, or the the default way of working. Right? You can use something built in.

17:34 That's exactly right. Yeah. So the idea is that there's the way we try to communicate that is it's about shortcuts, not make it easy, the default should be good, you should just be able to get something on screen quickly. Possibly you could all use in particular, you just drop your data visualize itself, but then you shouldn't be stuck. There shouldn't be, there's plenty of libraries where you just put get your click pop, but it's really hard to customize from there. And that's something we had some gerund as well. So apologies is pretty opinionated. Actually. It doesn't fit the regular model, people. It's imperative probably model, where you say, you get your figure, get your axes kind of modified a little bit of the axes. It's about just wrapping your data and have it visualize itself. And then you can tweak like the options on. And that does not always work for everyone, which is something we have to learn, we kind of decided now that we'd rather meet people where they are right people already, there's already such a such big ecosystem with visualization tools. And at this point in big ecosystem dashboarding tools. And rather than tell people like this is the way you have to do it, you should just be able to plug in what you have, and go from there. And that's a philosophy behind the default, which is kind of a wrapper around all of us, which just use pandas, you'll know that pandas has a dot plot API, which just takes your kind of data and you tell it a little bit of this goes on the x axis goes on the y axis, or color by this variable, and then it gives you the pot. And we wanted to take that and kind of say, well, this works well for pandas, we want it to work for the entire period, right? If eapol is meant to work, not just with pandas, but with dask X ray, network x, geo pandas and the most recent addition, there's also GPU data.

19:18 Yeah, I haven't heard a UDF. That sounds really cool. But like dask is very powerful. And I had Matthew Rocklin on the show to talk about that. And it's like, data, like pandas. Yeah, it's like pandas, but across multiple machines, if necessary, and sort of thing that has the same if

19:36 necessary, or across multiple cores locally. Yeah.

19:39 Yeah. Yeah, it's super,

19:42 not super nice, particularly, because we have this data shader project takes your data. And you can describe it as as like a fancy 2d histogram, right? Get some heat map out, but it does this really fast. It's built on lumber and tasks. So you can generate images from data points really quickly. It doesn't just support point data, it supports polygons, lines, supports regrading, of rosters and cloud meshes and try meshes. And basically, it just takes your data renders it with numbers and just put something on the screen really, really quickly. And the idea there is just to have fast and accurate rendering of large datasets and when we're talking large, millions or billions.

20:23 Yeah, that's cool. I think it was on your project, where you've got a picture of Was it the us wanting every single person and where they were right. super quickly? Yeah, yeah. 300 million data points, a second or less

20:36 interactively, zoom in and out. I think, particularly now that we have the support, which is, the video has this new initiative called rapids, where they're rebuilding the PI Data System on top of GPS. And it's crazy. It is crazy. But you could always use, like, Yeah, exactly. Used to how to write these, these CUDA kernels yourself. And it was really obscure, but now it's just your GPU. And it just works. So they had an initial prototype for data shader. And our collaborator of ours called dummies who is Chief Scientist at poly and took that and boats attended the shader to support GPUs natively. So now you can, like it takes maybe 1020 milliseconds to aggregate these 200 million data points. Incredible. So cool. And in theory, it's also scales across multiple GPUs on

21:31 it's never ceases to amaze me how powerful GPUs are. Every time I think, wow, I can do that. That's amazing. That didn't like nobody could do more.

21:42 Exactly. And yeah, it's kind of, for us. It's coming full circle, right? GPUs and GPUs stands for graphical. Yeah. If you actually wanted to use it for graphical stuff, or visualization stuff, there is basically like, you have to write your own kernels and be super painful.

22:00 all the hard work.

22:02 That's super cool. Very cool. All right. So we've got all of us. And then related to that the library geo views. You talked about each v plot, yes. Data shader. This is what we're just talking about quickly rendering, like the 300 million points on a map, talked about puram as the basis for like the data class like functionality. We also have color set.

22:22 Yes, Casa is just we all know that color maps can be crucial. I don't know if everyone knows this. But I think the most famous example is the Jets or rainbow color map. That's been subdivided, for good reason. Basically, it distorts the data space terribly. And you can do all kinds of false conclusions, just because in color space, it's distort things. And so there's actually scary stuff about potentially, like, doctors may have drawn false conclusions just by looking at the Colorado jets in misinterpreting debt, which is easy to do. And so we create a package called color sets, which basically just has a set of assumptions, a uniform color maps. It actually takes work. I should read up on this. But yeah, basically, we took a set of color maps that someone posted paper about published rakdos, from the Viking ecosystem. So please look at the website to look to look up the names, I feel bad for not properly crediting personnel,

23:19 it's really handy to have that put together and well thought through and yeah, choosing colors, one that look good, and two that are meaningful and not so easy.

23:27 Yes, it's not easy. But I mean, thankfully, the two has become really aware of this and see what it was like, turning point was when he fucked up. And

23:42 I think it's one point it actually was.

23:43 The final star of the show here is panel, which you talked about being that six to nine month project that you got to work on new dashboarding tools.

23:51 Yeah, so the Python ecosystem for a long time. arhat, shiny and shiny, is great. It's super cool. It makes it easy to share your analysis. And our button didn't have this right. It had there was early on there was like this Jupyter dashboard project where you could take a Jupyter notebook and kind of arrange the cells a little bit and get a layout and use that for a little while. But then it was abandoned. But this was probably what we kept coming back to and people want to share their analyses without as an actual dashboard, or just the little app. Yeah. And so we decided to build panel. Just before then, actually plotly came out with this project called dash, which is also a really nice, nice library. To build dashboards in Python requires a little bit more knowledge of CSS and JavaScript in certain cases. And we wanted something where people could just drop in their analysis, their existing analysis, it could drop it in, you could wrap in a function and then annotate that, like this function depends on these things. And so when those things change, it updates on directive model. And we just wanted to apply the word drop in your existing analysis. You've got some notebook, you want to share it and reduce the friction about what we've covered. see over and over again in different organizations with the fact that they had a bunch of data scientists, and then they had an innovation team on the data scientists to someone else's. And then they have to hand it over the visualization to you didn't necessarily work in Python. It could be had some custom JavaScript framework or Yeah, there's just friction in making that transition from Python analysis to terrible dashboard. That process, right, and that's how

25:27 to panels away work. Yeah, you can basically lay out different parts of visualization. And it's like organizing a notebook with not necessarily showing the code and you get some little sliders and other widgets that you can interact with it private. And to me, it feels like just a really nice way to quickly take what you're already doing an exploratory mode and put it up in a user interactive way. without learning flask API is JavaScript. This portion of talk Python, to me is brought to you by data dog, are you having trouble visualizing bottlenecks and latency in your apps, and you're not sure where the issue is coming from or how to solve it. With data, dogs end to end monitoring platform, you can use their customizable built in dashboard to collect metrics and visualize app performance. in real time. Data dog automatically correlates logs and traces at the level of individual requests, allowing you to quickly troubleshoot your Python application. Plus, their service map automatically plots the flow of requests across your application architecture, so you understand dependencies and can proactively monitor the performance of your apps. be the hero that got your app back on track at your company. Get started today with a free trial at talkpython.fm/data dog?

26:43 Precisely. The idea is that, yeah, you just have your houses, you drop them into this thing, you could put it in a bunch of rows and columns laid out in your screen. And then you put one little command at the end of this thing. You build the layout you build called servable. And then you can run panels serve the notebook, and it just pops up with your dashboard. Nice. How do you host it? So actually, it's just built on bouquets. It's just a tornado server. So you can just posted on any cloud server providers, we are trying to kind of build out their documentation to make a really simple process, or even thinking about like having a command to say, I'm gonna deploy this AWS, or Google Cloud or whatever, right? Oh, yeah, that's hilarious. We're definitely working on. But in the end, it's just what is that? Like a container? Oh, absolutely. Oh, yes. Yeah, that's, yeah, flower examples we have we build on this tool called and it kind of project wraps a combat environment with some commands, and then deposit. And what we're hoping for is that I think there's a PR, maybe it's already merged, basically just given this project file, which is just the mo file with the environments and commands that it runs. And then it builds a Docker container for you. So I think that that's a really nice way to go. Yeah, contain everything, your entire environment and your the commands you need to run. And then

28:00 yeah, maybe even go to like some kind of hosted Kubernetes cluster service. Let's go take this. Run it there. Make sure it keeps running upgraded for me if I need it.

28:08 Yes. But we're certainly looking forward. So if anyone's interested in helping us out. We're always looking for contributors at otherwise, yeah, we're also working on.

28:19 Yeah, it's a bit of an orthogonal problem and skill set to solving visualization, right? Like, it's one thing I can do the JavaScript a cool, cool visualizations, another like what now I do DevOps to?

28:29 Exactly right. Yeah. Oh, that's something Yeah.

28:32 I'm hopping right into a little bit. Yeah,

28:35 yeah, absolutely. So on your website, it gets into the Getting Started section. So at all of his h lovz.org, you've got a document that sort of talks about, given these different scenarios, more of a picture, I guess, the answer a couple of questions about it. And we'll help you choose the subset of tools to kind of flow this together. Right, right. And want to maybe talk us through some of these scenarios. So it says, Are you working with tabular data, working with like, other types of arrays and network and dimensional or streaming data? And then there's like, sort of this flow of a guy, here's how you piece together these tools to come up with something. Excellent.

29:14 Yes. So we call this a mermaid diagram. And if you look@politics.org, you'll find it there. Actually, I don't know why it's called. Go. But yeah, the general idea is that it takes you from the type data set. So let's say you've got some tabular data. And now you decide, like what library should I choose to load this data? So you might below a certain threshold? pandas is totally fine. And if you want to go to geographic data, use geo pandas or actually geometries, geographic geometries, use geo pandas.

29:44 Right. And just so people don't. Yeah, and your cut off here. You say, do you have more than 50,000 rows? I mean, obviously, it varies a little bit on the computer. You have

29:52 Yeah, this is pretty arbitrary. Yeah, exactly. But

29:55 it's, it gives you a kind of a sense it's not millions of rows or Or billions or something like that, right? It's it's not that high of a number to say, Okay, well, maybe you want to consider something other than pandas, or working on some of this, but yeah, okay. Is it a huge amount of data? by some definition? A huge? And if not, is it geospatial,

30:14 then you might use geo pandas. Otherwise, you may use desks, for example, said, a desk is a great tool. We've already mentioned the time to load, you've got millions or billions of rows, and you just can't load it onto memory, you just don't have the space anywhere on the desk. And then the whole point behind our ecosystem, particularly HP part is you shouldn't have to change any code. Whether you choose pandas or dask, or now the KU df library shouldn't have to change any code, if you should just be able to dump your data into this this framework, and then called ag pull on it, and then you get your plot out. That's kind of the philosophy here. The same applies to like you've got some and dimensional arrays, we generally recommend, for example, that you go with X ray. So x ray is really underrated as a library. It should be more popular. I don't know, maybe not that many people have like n dimensional arrays, but it's kind of pandas for four dimensional.

31:12 Yeah, for beyond tabular,

31:14 tabular. Exactly. So you've got, you've got satellite imagery over time, right? border, solid, right, or microscope data, also over time, like z stack over time, but four dimensional five dimensional whatever. Okay, for that, it's just really nice to see. So that's where that kind of data, you might use that instead. Or you might keep it simple and just use NumPy or disk arrays, which is a lower level. But in the end, the idea is that Yeah, in any case, you just drop it into a tree with the call. And then you get this all of these objects out of them. And it's all of these objects will already display itself. He said you already have an interactive plot, but then you might have the issue like, yeah, this was a lot of data used to das data frame for a reason, right desk array. So you may not want to dump it straight into your browser, jumping a gigabyte to your browser's surefire way to crash it.

32:04 Even with the speed to download a quickly you know that much JavaScript is going to make it hurt.

32:08 Yeah. It hurt, yeah. Okay, well fall over your mother will fall over. So that's done, you just have the option to HP plotter, say, finishing this instead. And so that that means that you've got server side application to kind of aggregate this data as you're zooming around. And what you get out is a nice interactive bouquet plot and tree plot. Or if you want to do a bit more customization, if you'd bought actually doesn't directly support plotly or output. And then once you're there, you can save it you can share your share your notebook, or you can dump it into panalytical.

32:43 Right, right, turn into a dashboard. Okay, yeah, very cool, all linked to this flowchart for people over so they can think about it when they're checking this out. And I think it also gives you a sense of, even outside of hollow is there's useful stuff, you're like, do I have streaming data, maybe check out stream z, or in dimensional? Like you said, check out x ray, and dask just this idea of like, how do I think about the right underlying library, rather trying to jam everything into pandas or NumPy? or something,

33:11 right? Yes, yeah, a lot of people I still see like 2d pandas, and they try to switch to X ray. It's a great library. And actually, pandas used to have this n dimensional data structure called a panel. And they've actually deprecated on saying, Yes, please, please just use x ray.

33:27 I see. Okay. Interesting. I didn't realize that when you talked about visualizing millions and millions of data points quickly in the browser, he said, okay, when these data shader, and I don't know that we necessarily dove into it enough to say exactly what it does. So basically, we'll see if I get it, right, how it works. It can look at millions, hundreds of millions of data points and say, well, the size of the graph is really this. And if you look at these 10,000, at this scale, that's kind of going to be the same. Is that how it works? Or is it like, does it downsample somehow? Or how does it How does it actually make meaningful pictures or process of that?

34:04 That's the cool thing about it. It's so it actually is that fast, right? It actually, it always looks at all your data. Okay. But if you're zoomed into something, obviously you won't. It's out of your report.

34:14 Like it has the clipping outside the rectangle.

34:16 But if you're zoomed out, it actually does iterate over your entire billion data points and aggregates. And

34:21 if that happened on the server, right,

34:24 exactly, exactly. The server and then all it has to do is send an image of the aggregated data. Or if you have 10 million points or a million points, when 1000 1000 pixel image is much, much smaller than the actual.

34:39 Yeah, absolutely. Yeah. The that's basically the same size, no matter how much data you have. Exactly. Yes, compression takes a little bit better, right, not so much. Yeah.

34:49 So yeah, you're a fixed size image editor. And that works with most visual elements. we've kind of been expanding the visual elements when he was first grade or thinking, aggregated point data and line data. But now it's time to find it. Cover polygons try meshes called meshes. Just downtown images, as you're zooming. If you've got a huge like, one gigapixel image, you can dump it into data shader downsample it down. around.

35:16 Yeah, that's great. And looking at it says that it has scalability with dask, or CUDA, KU, df, how do you configure it to make it choose one or the other on the server, if you have

35:27 a das data frame, it'll just take that, take that. And basically, the way that data frame works is basically think of it as a bunch of chunks of the underlying pandas. And these chunks might be distributed across like they might be on your machine. Or it might be across like, a whole cluster of machines. And so it just keeps the computation local, it means that the aggregation for each of those chunks happens on the particular node in your cluster. And then once it's done the aggregation, it just has to send the fixed size image back to the main load on to aggregate. And so that way, you can distribute the computation, but still have a global condition.

36:04 Okay, so what you provide to the data shader basically tells it how to process it, you give it a panda's data frame, you give it a data frame, you give it a PDF frame, and they just go in there's knows how to work with all of them. But that that implies how its computed. Yes.

36:18 So those those three in particular?

36:21 Got it. Okay. Now I see.

36:22 So we've actually done datashare.org, if you look at I think one of the user guides or sorting guides, there's a nice handy little table just showed you. For this data type, like four point data, these data backends are supported. For points, for example, the task for a image in my free X ray, because it doesn't show here.

36:47 Yeah, exactly. Cool. Well, it seems like a really nice way to fit all these things together. And just such a great API, maybe we could talk about some of the projects or communities that are using all of it.

37:00 Oh, absolutely. One ecosystem that's really taken up the souls to create data shader is the NGO intuition, which is basically initiatives by various geoscience folks to build like a big data platform to analyze data in the cloud, right? So you used to have all these different data silos, where like, you have the data in the cloud, you have to download it onto your local machine and export. And they've been building basically, the platform. So you can easily deploy your own Jupyter hub, and then keep your data in the cloud, but analyze it It also. And so for these people, so it might be climatologist or climatologists or washing. Machine knows first, Russian hackers don't have these huge datasets, right? Huge meshes of data, and they need to do them. But they weren't really tools to do that. And thanks to all these open source tools like that, you can load the data from the cloud, aggregated using data shader, and kind of zoom pan interactively around without having downloaded data. And that's a great use of it makes me super happy to see a cosmetologist.

38:11 Yeah, solving some real problems. That's awesome. Also intake with that

38:15 intake. Oh, that's a really interesting project. So again, actually, this is also leveraged by the SEC as a project, if you've ever had to like you have a bunch of data sources, and you have data catalogs. And if you want to keep track of all your data, you want to not have custom scripts to load this kind of data. And that kind of data intake lets you write one catalog data catalog yamo file to specify, I've got the CSV files here, there's 10,000 of them and load them somehow. And then I've got a CDF file there. And I've got a bunch of RP files here, you can kind of encapsulate that all in this nice little bit catalog, load it on then by courts. And so in the notebook, it literally just wanted to your catalog and say load this, and thanks to the specification in that yamo file, it'll just load it. And that has integration with our tools in a number of ways. So first of all, you can even put your hv pod spec into your data catalog. So you can say, if I have this default plots that I always look at for this data, right, so you can say, in this data source plots, plot the x column points at the longitudes in this CSV file, longitude and latitudes, and then it will automatically generate your plot

39:32 pre declared your plots in this. I see.

39:35 Okay, that's super cool nother

39:37 thing it has it has a little beauty built on top of how it kind of lets you explore the data. You've already got your data. And now you can display a graphical interface and click around and say I want to plot this

39:50 color but I will specify

39:51 how to import it and whatnot and then visualize it as great. And then what's cross filter.

39:58 So I only recently found out about this So in video, as I said, has this rapids initiative, and they kind of played around with with visualization, various forums. And interestingly, they built this cute cross filter library built on top of panel and boeken, to build cross filtering applications, called filtering, also a virtuous loop brushing. So you select something on one pots, and you see that reflected in other pots. And that's built on panel and basically lets you build a dashboard, like coaching stuff.

40:35 And then some space stuff as well.

40:37 These are actually projects that they're consulting clients we work with. So one of them is the lsst. telescope, I think it's recently actually got a real name, which is the Vera Rubin telescope, the largest optical telescope in the world. And they basically approached me saying they want to fix it, they want to do some q&a on the data that comes with this

41:00 huge monster at the start terabyte today,

41:02 it can be challenging because they have 50 petabytes. image has data, like they're gonna be collecting and stuff. And yeah,

41:11 yeah, exactly if you have to do the same analysis every day, until they can handle this stuff. And so I handed over this project to concerts. So our thing is, Trumps all the fonts are fun to win often started with other consulting firms. And they had a builder's project to them. And they've kind of been a help for me, but not much. It's built this really nice dashboard to do the q&a stuff for them. Yeah, actually will handle 500 petabytes, but maybe even more quickly.

41:43 Yeah, but still, that's pretty impressive. That's awesome. And it's cool to see all of these projects using the libraries you guys put together, they're probably giving it some pretty serious testing.

41:52 So testing, science, and ultimately find all the little performance issues. And it's great to see them as part of an executive job, right? I don't just building tools is awesome. But really, if you're completely divorced from like, the actual users of those libraries, it's really hard to tell like, are you doing the right thing and just wasting your time? So it's super nice to go back and forth between actual consulting, always see people's problems, and then come back to the tool?

42:20 Absolutely, yeah, you need this blend that keep it real. But if you're too focused on just solving problems for consulting, you don't get that spare time to develop new stuff as much. Yeah, I want to close the conversation out with two quick things. One, I really like explore things like this, like visual play analysis type of libraries and whatnot by just looking at some examples, because usually can look at some nice pictures and get a quick sense. So you guys have a bunch of different tutorials or simple little examples, expositions, showing them off, maybe Is there like a couple of favorites? Just want to point people out and tell them what it does?

42:56 Oh, absolutely. Yeah, I don't know how best to do this. So what we can do is, so most of our websites, particularly for like all of us, and geo views, and I know we have a little gallery. But then we also have this website, examples of fibers at work, which really, we want contributions of abundance. These are tools, again, fibers, we want to be this general thing. So if people are interested in contributing, they're all examples.

43:20 Yeah, just grab a couple that you think are like really cool and show off stuff people would appreciate.

43:24 Yeah, so let's go with the one that you talked about earlier, the census example. So if you go to examples, if I visit or just search for census, it'll be right on the on the main page, right? There's like a gallery. And then if you click on that, there's at the top right there is the census example, which kind of explores, how do I use? Yeah, I've got this test data set, how do you actually display it, like, start exploring it? So it starts off by kind of loading the data set, loaded, in this case with a library called well, and in fact, it's just loaded with data. But we've got this little wrapper around desk, which does spatial indexing. So spatial indexing means that it has built an index of the space

44:02 determines our tree,

44:04 it's super fast to say, Show me the things that are near here or right here, because I said,

44:09 Yes. So if it's if you're zooming in and no longer, as I talked about earlier, but default data shader SS scan through the entire data set each time, even if you're zoomed in to just a low spot with special indexation, to say, Okay, this stuff is definitely not in my view, for I don't, I don't even consider it. And so it becomes faster, right? If you look at that notebook, you kind of start by loading the data set data, shredding it. And we start with just simple like linear aggregation. And what you'll immediately notice is that it's just black. If you think about the population in the US, there's a few hotspots, but New York is super dense. All the cities are dense, dense. And so all you see with linear color mapping is New York and then a few blurry cities. A little bit of LA, a little bit of Chicago,

44:53 that's about

44:55 Yes. And so the nice thing about data shader is that it kind of takes away from That you can do linear, you can do a log, or you can kind of adjust the color map. And that's kind of difficult. But by default, did shader actually does something called EQ histogram, which is histogram equalization, which means that it kind of, it adjusts the histogram of the color map in such a way that the lowest number of subjects, but it kind of equalizes the color map in such a way that actually

45:22 picture over audio only,

45:24 yes. But basically, it reveals your the shape of your data, if not the exact like, values. So you shouldn't use it for like, right reading out the exact values or something, but you should get an overall idea of something, it's a really nice mechanism. And that kind of as part of what makes like, he looked at the data shader image, or you'd like to sometimes on Twitter, I'll see like an image and it just pops. And he calls, it makes sure that you see the overall shape of your data values. And this kind of goes through that example, goes through that and kind of explains what does this actually do. So in the census data, you can see like, see the shape of each city now, you can see kind of a lot of mountainous area, and the west of the US is kind of empty. And yeah, it really reveals like the population distribution, yes. And then it also kind of demonstrates how to manipulate your data, your color map to show the hotspots, especially because, you know, aggregating to fix this image, you can say, values above this value, like, well, this density, you call it red, and so you really get the statistic. Yeah, very cool. And then maybe I won't go into detail on this one anymore.

46:36 To check it out. Sure, sure. But it really built up people check it out. I'll link to it. And it each one of these steps like builds up like just like a line or two of code. It's not super complicated.

46:44 Exactly right now, and then we kind of explore kind of depressing facts. So this dataset has basically the race of all the little people. And you can really see the segregation in different cities. It's horrible to think about, yeah, that's not good. Yeah, it really, really is the fact if you look at that. Yeah, I see. And then finally, it rounds out with like, showing you how to, because our tools are meant to work well together. The final example kind of demonstrates how to take this and use all of these to generate an interactive plot where if it was running on the server, so on the on the website, you're not gonna be able to zoom very much kind of gets very pixelated. But if you're running it on your own in a notebook yourself, or you deploy it into a certain panel, for example, then you can zoom around and kind of zoom into individual people.

47:31 Yeah. Wow, that's wild. So maybe there's a whole bunch of examples over examples up high vis.org. And each one of the examples is tagged with the various libraries that people might want to explore. So there's a bunch of people can go go there. And yeah, and dig into them and check it out. And this is just one of them.

47:46 Yeah. And we're trying to keep those up to date. And if you build a cool thing, please. Yeah, awesome.

47:50 All right, I guess maybe you could just touch really quick on awesome dash panel. org, and then tell us what's next with the whole project. And I'd be out of time, then

47:58 I've been super happy to see. So community building is hard. And I think many open source of authors realize this. And we've got to learn. But there's been quite a lot of interest and panel and marks of maxon. works in Denmark, and built this website called autopilot. org, which really, you can better show off what you can build upon. So are examples kind of try to focus on the simpler stuff, but also kind of what really shows you what you can do. And it's really impressive what he's done with a funnel on autopilot. org has lots of resources for like, how to best leverage things. And a lot of stuff, ideally, would migrate back to our website. But yeah, he's built this complex, multi page sites, which takes you through like a lot of different ways. A lot of examples.

48:46 Yeah. Cool. It's kind of meta, right? Like, there's panels involved in it as well.

48:50 Oh, yeah. It's built on top of the website is built entirely in panel about panels, but also about panel. And yeah, I've been trying to take that further as well. So recently, I did a talk for our Comic Con, our conference here. And I built the entire presentation. And I built a presentation tool on top of panel. Demo panel.

49:10 Yeah, that's a very cool. All right. So what's next with the whole project? What are you guys going?

49:15 So one thing that we've kind of been working on recently a lot is in terms of all of us is we've made selections. So you can now in the spirit of shortcuts, you can generate your holiday spots. And if they're all using the same data, you can just say, Link selections, apply to these various components, and then automatically look up all the links linking between the parts so that when you select one, all the other ones update. That's I'm really excited about that. Super, super complex. So let's dive

49:44 into the data. Yeah, cool.

49:46 And particularly with the GPU support. Now let's build like, yeah, with GPU support and data shader, you can, like explore 10s of millions or billions of data points, easily selections really. Cool. That's one thing I'm excited about. Okay. Another thing in terms of penalty, because system I'm excited about is the next release is going to have default templates. And so what that means is, I've always had the ability to just put stuff together, you say this thing goes in a row with this thing, like a bunch of widgets going grow, or a column. And then you put plot here. But if you want to have more control, you kind of had to write your own kind of HTML and CSS to lay things out. And you habitability templates. But next release is going to

50:29 do you might not want to exercise it

50:30 yet. So we're, that's what we're trying to keep people from, right. We don't want people to write that kind of thing. So what we've done now is added some default templates where you say, I want this to go on the sidebar, I want this to on the header on this tick on the main area. And it looks like a polished website. It's not just a bunch of things on a web page. It's actually yeah. So I'm also really excited about that, because it's something that's been missing for weeks, we've had a lot of cool little demos, but like to build a whole nice polished looking dashboard to gather

51:01 more control.

51:04 And then the last thing I'm really excited about in the next release of the panel is integrating with other ecosystems. So if you're familiar with Jupyter, you'll know about like our widgets, and widgets has been lots of hobbies have now built on top of Jupyter widgets. There's things like volumes, explore 3d volumes, there's just a whole bunch of garbage, right. And it's kind of been kind of in a shame that we we don't want to have diversion assistance, right. So in this next release, we're going to be able to just put your iPad widget into your panel, or even your, okay, this has been done with the puppy level. You just drop it in there. And just even on the deploys tornado server, the bookie server, it'll just work and load digital, which is correctly understood the communication point. And so now, we don't have these two versions or systems anymore. You can just use either widgets in panel, or you can go the other way and kind of say, Well, we've got this deployment system. There's this blogger cupola, which kind of serves Jupyter notebooks as well. And so you can now put your power into below, you just make sure that ecosystems don't diverge. And you can use the tools that you want. Yeah,

52:15 that's nice. Because you don't want to have to have a separate set of widgets for your visualizations. And then people also building them for Jupyter, of course, they would build them for their right. So might as well just bring these together.

52:27 Exactly. Well, it's been a long effort. So I'm

52:32 super glad to be able to share.

52:34 Yeah. All right. Well, very cool. Great project. great examples. And it looks like you got a lot of momentum going forward as well. So thanks for bringing all of your experience and talking about what you guys have built there

52:46 so much for having me. It's been great.

52:49 Yeah, you bet. Before we get out of here, though, you got to answer the final two questions. If you're going to write some Python code, what editor do you use?

52:54 Oh, I'm still on Emacs. My professors. Now my bosses don't come to me. And I'm still there. Although I do dabble in.

53:02 Yeah, very cool. And probably some Jupyter as well, at some point. Yes. Yeah.

53:06 I always have like, four different tabs open. Excellent. So

53:11 yeah, cool. And then notable pi package. You got a bunch here already. It's worth saying you can just pip install all of his

53:18 Oh, there's a bunch. But yeah, I think I've already Yeah, we can put them. So all of this, which is the tutorial. But I really want to take this opportunity to plug some of the underlying libraries. X ray is awesome. I've already mentioned this. And that's been awesome. Sorry.

53:33 I agree. All right. Very cool. So people are excited about holidays. Sounds like it really might solve their problem, or help them build some dashboards, but a call to action, what can they do?

53:43 Come visit us at holidays. org. There's a tutorial that will take you through initial steps using our projects and on the dashboard. So go there, check that out. Also, check out examples of privacy org to see what you can do if you if you master the stuff to build more complex examples. And then message me on Twitter or message our visual projects like all of us on Twitter. And if you've got any longer form questions, join us on our discourse, which is discourse.

54:14 All right, well, it looks a great project and I think people build cool things with it. So thanks for sharing it with us yet, bye. This has been another episode of talk Python to me. Our guest in this episode has been Philip rudiger. And it's been brought to you by brilliant.org and data dog. Brilliant org encourages you to level up your analytical skills and knowledge visit talkpython.fm/ brilliant and get brilliant premium to learn something new every day. Data dog gives you visibility into the whole system running your code, visit talkpython.fm/ data dog and see what you've been missing. But they're in a free t shirt with your free trial. Want to level up your Python if you're just getting started? Try my Python jumpstart by building 10 apps course or If you're looking for something more advanced, check out our new async course the digs into all the different types of async programming you can do in Python. And of course, if you're interested in more than one of these, be sure to check out our everything bundle. It's like a subscription that never expires. Be sure to subscribe to the show, open your favorite pod catcher and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes. The Google Play feed is /play in the direct RSS feed at /rss on talk python.fm. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Get out there and write some Python code

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon