Episode #269 - HoloViz - a suite of tools for Python visualization

Episode Deep Dive Links Transcript

The toolchain for modern data science can be intimidating. How do you choose between all the data visualization libraries out there? How about creating interactive web apps from those analyses? On this episode, we dive into a project that attempts to bring the whole story together: HoloViz.

HoloViz is a coordinated effort to make browser-based data visualization in Python easier to use, easier to learn, and more powerful. And we have Philipp Rudiger from HoloViz here to guide us through it.

Episode Deep Dive

Guest introduction and background

Philip Rudiger is a longtime Python developer and data scientist who originally studied electronic engineering and neural informatics. He discovered the need for better visualization tools while doing complex simulator work in C++ and Python. Later, he joined Anaconda (formerly Continuum Analytics) and helped develop the ecosystem of HoloViz libraries (including HoloViews, Panel, and more). He works on consulting projects that often lead to improvements in these open-source tools, ensuring they're both theoretically sound and practical for real-world data visualization challenges.

What to Know If You're New to Python

If you're newer to Python, it's helpful to be familiar with basic data science workflows before diving into HoloViz. Below are a few tips:

Make sure you have a solid grounding in Python fundamentals, including reading data from files, working with loops, and basic functions.
Learn how to use notebooks or JupyterLab as this episode discusses using Jupyter for data analysis and interactive plots.
Have some grasp of Pandas, at least for simple data operations, because it’s central to many examples with hvPlot and HoloViews.
Know a bit about Bokeh or other plotting libraries, since tools like Panel and HoloViews build on Bokeh’s interactive plotting capabilities.

Key points and takeaways

HoloViz as a Cohesive Python Visualization Suite This episode centers on HoloViz, an umbrella project aiming to unify the Python visualization story. Instead of just one library, HoloViz coordinates multiple libraries, such as HoloViews, hvPlot, Panel, and DataShader, to help you work with data from small to very large scales. The approach is “shortcuts, not dead ends,” letting you get results quickly without sacrificing customization. With HoloViz, you can turn your data exploration into shareable interactive dashboards or web apps.
- Tools / Links
  - HoloViz (holoviz.org)
  - Bokeh (bokeh.org)
HoloViews: Automatic Plotting by Wrapping Your Data HoloViews emerged from a frustration with writing endless plotting code to visualize neural network data. It takes a declarative approach: You wrap your data in HoloViews objects, and the library decides how best to visualize it. This helps you keep your analysis code clean and consistent, while you still have full control to fine-tune details if needed. Many HoloViews plots also render through Bokeh, providing interactive pan and zoom.
- Tools / Links
  - HoloViews (holoviews.org)
hvPlot: A Familiar .plot API for Many Data Types hvPlot builds on top of HoloViews to offer a quick one-liner plotting API similar to DataFrame.plot(). It extends that idea to other data structures such as Dask, Xarray, and even GPUs with cuDF, so you don’t have to learn new plotting commands for each library. This uniform API helps you effortlessly scale from a few thousand rows of data to billions, often without changing a single line of code.
- Tools / Links
DataShader: Handling Millions or Billions of Data Points DataShader tackles the challenge of plotting huge datasets, millions or billions of points, that can otherwise overwhelm browsers or slow down drastically. By doing server-side aggregation and sending only the resulting image to the client, DataShader preserves the overall shape and density of the data without requiring large data transfers. This allows for interactive zooming and panning, even when datasets exceed local memory.
- Tools / Links
  - DataShader (datashader.org)
Panel: Quick Dashboards and Interactive Apps Panel provides a streamlined way to share your data analysis as an interactive dashboard or web application. You can lay out your plots, data tables, sliders, and other controls directly in Python code (or a notebook), then serve it on a Tornado-based server with minimal effort. This makes it possible to bypass learning JavaScript frameworks or complex Flask apps for many interactive data needs.
- Tools / Links
  - Panel (panel.holoviz.org)
Real-World Usage from Climatology to Astronomy HoloViz has gained wide adoption in scientific communities, such as the Pangeo initiative for climate and ocean research and the LSST telescope project for massive astronomy datasets. By marrying Xarray, Dask, and HoloViz, researchers handle terabytes or even petabytes of data distributed in the cloud. Building interactive dashboards means they can quickly explore or validate scientific results without requiring specialized JavaScript teams.
- Tools / Links
  - Pangeo (pangeo.io)
  - Xarray (xarray.dev)
Shortcuts Without Lock-In: Easily Switch Backend or Scale A key theme is minimal code change when scaling up or switching backends. For instance, you might start with a Pandas DataFrame but later move to a Dask DataFrame once your dataset grows. hvPlot automatically knows if you are passing in Dask or Pandas data, letting you scale seamlessly and remain productive.
- Tools / Links
  - Pandas (pandas.pydata.org)
The Importance of Param and Colorcet Under the hood, Param provides typed class parameters that help define interactive components for HoloViz apps. Colorcet addresses the need for scientifically valid and perceptually uniform color maps, avoiding pitfalls like the infamous “jet” colormap that can mislead the viewer. Both libraries are part of HoloViz’s comprehensive approach to data exploration.
- Tools / Links
  - Param (param.holoviz.org)
  - Colorcet (colorcet.holoviz.org)
Example: Plotting U.S. Census Data The conversation highlighted an example with 300 million data points from the U.S. Census, showing how DataShader dynamically aggregates the entire dataset in a fraction of a second. By adjusting color maps (like EQHist) or selecting certain demographics, you see population distributions or urban hotspots. The entire plot remains interactive for zooming and panning without needing specialized front-end code.
- Tools / Links
  - Examples Gallery (examples.pyviz.org)
Future Directions: Templates, Cross-Filtering, and ipywidgets Panel is evolving to include templating systems for polished, multi-section dashboards without requiring manual HTML or CSS. Further enhancements include “link selections,” which automatically connect filters between multiple plots, plus deeper integration with ipywidgets for Jupyter. These improvements reduce friction and help unify the Python data ecosystem even more.

Tools / Links
- awesome-panel.org (community site)

Interesting quotes and stories

"Shortcuts, not dead ends, people should be able to drop their data in and quickly get a plot, but not be stuck if they need more customization." , Philip

"I’d wake up and flip through these PDFs, 60, 100 pages long, just to see if my simulations were doing anything meaningful. That’s what got me into building visualization tools in the first place." , Philip

Key definitions and terms

HoloViz: A coordinated effort to streamline Python visualization through libraries like HoloViews, hvPlot, Panel, and more.
HoloViews: A declarative library letting you wrap data in Python objects that define how it should be visualized.
Panel: A tool to build interactive dashboards and web apps in pure Python, serving them via a Tornado server.
DataShader: Library for server-side aggregation, allowing interactive visualization of huge datasets by sending images instead of raw data.
Param: A parameter library that provides typed and validated class attributes; used to define interactive widgets or app parameters.
hvPlot: A plotting API, similar to pandas .plot(), extending beyond Pandas to Dask, Xarray, cuDF, and more.

Learning resources

Below are courses from Talk Python Training you can explore to deepen your knowledge:

Python for Absolute Beginners: Perfect if you want a thorough introduction to Python before diving into data visualization and dashboards.
Python Data Visualization: A direct fit for those seeking a comprehensive look at plotting libraries and data exploration tools.
Fundamentals of Dask or Getting started with Dask: Useful if you plan to scale up your Python workflows, especially for large datasets.

Overall takeaway

HoloViz brings a much-needed unification to Python’s data visualization story by building multiple libraries that are easy to adopt yet powerful enough for serious scientific and analytics workflows. Whether you’re working with a few thousand rows in Pandas or billions of points across distributed clusters, you can scale seamlessly, create interactive dashboards, and do it all within a familiar Pythonic environment. These tools lower the barriers to advanced visualization and let developers remain productive while sharing sophisticated results with non-technical audiences in a clear, interactive format.

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 The toolchain for modern data science can be intimidating.

00:02 How do you choose between all the data visualization libraries out there?

00:06 How about creating interactive web apps from those analysis?

00:09 On this episode, we dive into a project that attempts to bring that whole story together,

00:14 HoloViz.

00:15 HoloViz is a coordinated effort to make browser-based data visualization in Python easier to use,

00:20 easier to learn, and more powerful.

00:22 We have Philip Rudiger from HoloViz here to guide us through it.

00:25 This is Talk Python To Me, episode 269, recorded June 15, 2020.

00:30 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the

00:48 ecosystem, and the personalities.

00:50 This is your host, Michael Kennedy.

00:52 Follow me on Twitter, where I'm @mkennedy.

00:54 Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter

00:59 via at Talk Python.

01:00 This episode is sponsored by Brilliant.org and Datadog.

01:04 Please check out what they're offering during their segments.

01:06 It really helps support the show.

01:07 Philip, welcome to Talk Python To Me.

01:10 Thanks so much for having me.

01:11 I'm excited to have you here.

01:12 Now, we're going to talk about a bunch of cool libraries that are all brought together

01:16 around this HoloViz overall meta project, if you will, to kind of make...

01:22 Yeah, exactly.

01:22 This umbrella to make working with data and exploring and visualizing it in Python a little

01:27 bit easier.

01:27 And I think that's a great project, and it looks like it's getting a lot of traction, and I'm

01:32 happy to be here to chat about it.

01:33 I'm excited to talk about the various libraries in it.

01:36 Yeah, absolutely.

01:37 Before we get to that, though, let's start with your story.

01:40 How did you get into programming in Python?

01:41 So I got started with programming pretty late.

01:44 So apart from the usual thing of the 90s, you had a GeoCities website, you hacked some HTML,

01:50 CSS.

01:50 I really didn't get started with actual programming until I joined an electronic engineering course

01:55 in my undergrad.

01:56 So I moved from Germany to the UK to study electronic engineering and music technology, thinking I

02:04 was better at music than I really was.

02:06 But I took a liking to kind of programming.

02:07 We did some programming in C and Barilog, the pretty low-level stuff.

02:11 And then towards the end of that project, kind of that undergrad degree, I developed a simulator

02:17 of like bipedal locomotion in C++, which was far more complex than I'd envisioned it.

02:23 But it was really exciting to me to kind of actually get into like a big project of my own.

02:28 And from there, I then joined a master's course and started programming in Python and doing

02:33 data analysis.

02:34 And we had this simulator called Topographica, which did kind of...

02:37 That's cool.

02:38 You talked about working on this bipedal locomotion simulator in C++.

02:42 Like, forget the language.

02:43 That kind of stuff is trickier than it seems like it should be, right?

02:47 Oh, absolutely.

02:48 Because back then I had no idea about neural networks.

02:50 It was like I just kind of jumped in, heard about neural networks, heard about like genetic

02:54 programming.

02:54 So I built a huge network with way too many parameters and assumed like generic programming

02:59 would make it work.

03:00 It didn't.

03:01 It didn't really work.

03:02 The simulator worked.

03:02 So it did flop around and like my little bipedal humanoid would flop around a little bit, but

03:08 it never actually managed to generate actually real bipedal motion.

03:12 Yeah.

03:12 I guess there is something to training these models correctly and getting them set up right

03:17 in the first place.

03:18 It's not just magic that you can throw at a problem, right?

03:21 Yes.

03:22 But then I decided this thing wasn't complicated enough, so I'd actually try and solve the brain.

03:26 So that's, I then joined a master's and PhD graduate program, neural informatics.

03:32 And yeah, hoping to actually learn about how these things actually work.

03:36 Oh, that's really cool.

03:36 So was that like trying to model the way brains and synapses work using neural networks?

03:42 Yes.

03:42 So it was pretty close to what you'd consider like a convolutional neural network nowadays.

03:47 Well, they were around back then.

03:48 They weren't as popular.

03:49 Yeah.

03:50 And then, but then we also have like recurrent connections and it was, the idea was to model

03:53 the human visual system, basically.

03:55 You're just generally the mammalian visual system.

03:57 And so starting to vary a little bit.

03:59 What kind of problems were you trying to do?

04:01 What were you trying to ask it to do and see if it, what was success for you?

04:05 It was really, the idea was that it was self-organizing, right?

04:08 So that you didn't have to reprogram a bunch of like known stuff into the network or just

04:13 kind of organize like many of the convolutional neural networks nowadays do.

04:18 But we were trying to kind of keep it closer to the actual biology.

04:22 So we had different cell types that were interacting and those models were tremendously complex and

04:28 it was just super hard to analyze them.

04:29 Like I started having these huge model outputs.

04:32 Like I have like 60, 100 page outputs of like a model run of PDFs with just images in it.

04:38 And that's actually where I started developing these visualization tools.

04:41 So me and a colleague of mine were like, yeah, this isn't feasible.

04:44 We can't analyze these things properly.

04:47 Just looking, like flipping through PDFs, I'd wake up and like flip through these PDFs

04:51 and almost cut myself.

04:52 And then I'd start, we started writing this.

04:54 Yeah.

04:55 It's like trying to watch the green in the matrix, right?

04:57 Like just trying like, no, I can't see it this way.

04:59 This is, we got to look at it better.

05:00 Yes, precisely.

05:01 So the idea was that we'd start building something like you have these huge parameter spaces,

05:06 right?

05:06 It's like excitatory strings, inhibitory strings, the model would evolve over time.

05:10 And so we had these very complex parameter spaces that we were trying to explore.

05:15 And so we built this tool called HoloViews to kind of start digging into that and kind of

05:20 visualizing like what effect does this parameter actually have on the evolution of the model?

05:24 So you could drag a slider and see the accessory strength of this to the model, the inhibitory

05:28 strength of that.

05:29 And so then this is how it evolves over time.

05:31 And that was really breakthrough to actually start analyzing.

05:34 But it also meant that I eventually started spending more time building this visualization

05:39 tool than I was actually spending on my actual project.

05:43 And I found that's one of the challenges.

05:44 I found that in the end, I found that more rewarding than actually working on.

05:48 Yeah.

05:48 That's the real danger, right?

05:49 Is like, I mean, I started getting into programming, doing research on complex dynamical systems and

05:55 math and whatnot.

05:57 And I, after a while, I realized, you know, the part of this project that really makes me

06:02 happy is when I'm not doing the math.

06:04 That was a sign that I should probably be doing something else.

06:07 But it's, it is really fun to build these things.

06:10 I also do think it's, you know, it's a challenge of research projects and this academic stuff in

06:16 general is it's hard to get credit for that, right?

06:20 Like you're not, they're not going to go, man, that's a killer contribution to data science

06:25 that you made.

06:25 Here's your PhD.

06:26 They're like, where's the network?

06:28 Where's the paper, right?

06:29 Right.

06:29 Where's the publication?

06:30 Yeah, exactly.

06:31 Where's the publication?

06:32 Exactly.

06:32 Yeah.

06:33 And actually we did publish a paper on all of these and that was turned out to be the

06:38 only paper I published in my PhD.

06:39 The rest, I mean, yeah, basically my models didn't work until the very end, like two weeks

06:45 before I handed in my thesis, the models started working and I actually had results, but I never

06:50 got around to actually publishing.

06:51 Man, that's down to the wire.

06:52 Yeah, it was really down to the wire.

06:54 And at least, yeah, I'm kind of glad that's over with.

06:57 Yeah.

06:58 Well, it's, it's better late than never in that case.

07:00 So that's how you got into programming and found your way over to Python.

07:04 I mean, obviously it's a natural place to go.

07:07 Python is if you're doing neural networks.

07:09 What time frame was this in?

07:12 Like what year was this?

07:13 So I joined this.

07:15 It was a really good program.

07:16 So it was the doctoral training center in Edinburgh that actually doesn't exist anymore.

07:20 So it was part of the informatics department, but they had close collaboration with the neurotrans

07:26 department.

07:27 And I joined in 2010 and the first year was a master's program.

07:30 And then it was kind of, it took me way longer than it should to finish my program.

07:36 But I think in 2015, I handed in the thesis and did my defense.

07:42 Yeah.

07:43 Yeah.

07:43 Cool.

07:43 I'm just wondering, you know, you started in 2010 with, with some of this stuff.

07:48 If you started now, how much easier do you think it would be, or would it be basically the same to work on?

07:53 The visualization problems, the neural network problems?

07:56 It just seems like that has come so far in the last five years.

07:59 Oh, absolutely.

08:00 So back then I remember like we had, obviously we had to interface with C code a lot.

08:05 And so we had to, we used something called SciPy Weave, which many people probably don't know about anymore.

08:11 Yeah.

08:11 But it was this really awkward interface for C extensions.

08:14 And nowadays that would be, I mean, you've got things like Numbo, you could write the kernels that were running these neural networks.

08:20 And like in pure Python, you know, just to compile it to something fairly optimized.

08:24 Right.

08:25 Absolutely.

08:25 Same with the visualization tools.

08:26 There's like so many interactive visualization tools at this point.

08:29 So back then, for example, all of you just built using, on top of MepaLoop, that's what it output.

08:34 But the actual rendering nowadays, we build on bouquet and property.

08:40 And so we've gotten a lot more insights just contractively exploring this data.

08:44 That's really cool.

08:45 It's been such a huge evolution of tools.

08:47 Yeah.

08:47 Nice.

08:48 All right.

08:48 Well, what are you up to these days?

08:50 What do you do day to day?

08:51 So it's really nice.

08:52 I have kind of the freedom to switch between, well, not total freedom, but we kind of, when

08:57 I joined Continuum Analytics, which is now Anaconda in 2015, actually, before I handed in my thesis,

09:03 I was running out of funding and then joined Anaconda kind of as a day job between writing

09:08 my thesis.

09:09 All right.

09:09 And so I joined to do consulting.

09:11 Basically, I'd solve machine learning and visualization problems for various government

09:16 clients, corporations, and so on.

09:18 But from the very beginning, we kind of had this idea of we'd build open source tools that

09:23 would solve people's problems and then use them in our consulting.

09:27 That kind of model worked really well for us.

09:29 The entire whole of this suite of tools was built kind of as a spending quite a bit of

09:35 open source time, unpaid or unbillable time on the open source side.

09:39 But also, kind of, for example, a panel was built with funding from U.S. Army Corps of Engineers.

09:45 So they just started to take a gamble with us and built this new dashboarding framework.

09:49 And so I had the freedom for over basically six, nine months to build this new tool.

09:54 Yeah, a panel is really cool.

09:55 We'll talk about that for sure.

09:56 So yeah, I go between most of my time is consulting work.

10:00 But as much as possible, we try to contribute the stuff we work on during that time back to

10:05 the open source.

10:05 Yeah.

10:06 And so you mostly do remote work, I would guess?

10:08 Yes.

10:09 So actually, I was in Edinburgh.

10:11 Anaconda in Austin?

10:13 Yeah, I was in Edinburgh for years.

10:15 And then last year, I moved here back to Berlin, which is where I grew up.

10:17 And actually, we had an office.

10:19 So Anaconda just opened an office here.

10:21 And I thought it would be nice to actually spend two to three days a week just actually

10:25 going to the office, seeing people have a more regular routine.

10:29 Yeah, yeah.

10:30 Just work until 3 a.m.

10:31 And then getting up before noon.

10:34 And then COVID happened and we're back to being fully remote.

10:37 Yeah, yeah.

10:38 Well, going to be around people.

10:40 Yeah, not everyone's used to it.

10:41 Yeah, for sure.

10:43 It's a bit of a bummer.

10:44 I mean, for folks like us who can work remotely and just carry on mostly doing what we're doing,

10:50 it's a bit of a bummer.

10:50 For a lot of people, it's a tragedy, right?

10:52 It's a huge, huge problem.

10:53 Especially for kids.

10:56 I don't know how people do it.

10:58 I know.

10:59 It seems really, really scary.

11:02 Hopefully we get through that soon and we can go back to an office.

11:05 Who knows what people's desire to get back to work together will be like.

11:09 Some of these remote ideas, I think, are going to stick.

11:12 And some people are kind of like, oh, so glad that's over.

11:14 Yeah, I think so.

11:15 Particularly, I mean, it's really not a good test for people talking like this is going

11:18 to usher in the revolution.

11:19 But it's a forced scenario, right?

11:22 People don't have childcare.

11:24 They're stuck at home.

11:25 Yeah.

11:25 So I don't know if it'll just put people off of remote work entirely.

11:31 Well, I think you touched on the real challenge.

11:34 It's one thing to say, well, let's all try to be remote for a while.

11:37 It's another to say, let's work with your small children around you all the time.

11:41 Like, that is the real struggle, I think, as a parent to find the time and the focus, right?

11:48 So I think it's an unfair test.

11:49 But if it's working under these scenarios, this is like the worst case scenario.

11:53 So obviously...

11:54 And it still seems to be mostly working, right?

11:55 Exactly.

11:57 Exactly.

11:57 It's interesting.

11:58 Also, it's kind of nice.

11:59 I actually kind of like, I don't know, you have a client meeting and get all stolen.

12:02 Yeah.

12:05 Yeah.

12:05 It does humanize people a little bit.

12:07 I think, you know, it's...

12:09 I don't want to go too far in it, but like, you know, watch the news or you watch like comedy

12:13 shows that are still going.

12:15 And it's just like, yeah, everyone's at their couch or at their kitchen table or just their

12:20 little home office.

12:20 And yeah, it's funny.

12:22 So let's talk about the history of this project a little bit.

12:27 So you started with HoloViz.

12:29 HoloViz.

12:30 HoloViz.

12:32 Yes.

12:32 Thank you.

12:32 So Holo...

12:33 Yeah, it's kind of confusing.

12:34 And there's...

12:35 Yeah, yeah.

12:36 I know.

12:36 There's even more confusing history around it, which I'm sure we'll get into.

12:41 This portion of Talk Python To Me is brought to you by Brilliant.org.

12:44 Brilliant has digestible courses in topics from the basics of scientific thinking all the way

12:49 up to high-end science like quantum computing.

12:52 And while quantum computing may sound complicated, Brilliant makes complex learning uncomplicated

12:57 and fun.

12:58 It's super easy to get started, and they've got so many science and math courses to choose

13:01 from.

13:02 I recently used Brilliant to get into rocket science for an upcoming episode, and it was a

13:06 blast.

13:07 The interactive courses are presented in a clean and accessible way, and you could go

13:11 from knowing nothing about a topic to having a deep understanding.

13:14 Put your spare time to good use and hugely improve your critical thinking skills.

13:19 Go to talkpython.fm/brilliant and sign up for free.

13:22 The first 200 people that use that link get 20% off the premium subscription.

13:27 That's talkpython.fm/brilliant.

13:30 Or just click the link in the show notes.

13:34 How did you go from trying to create better visualizations to this larger project?

13:39 I guess, as a way of introduction, maybe tell people how you got there, and then give us

13:44 a high-level view of what it is.

13:46 Yeah, absolutely.

13:47 So we started with Olive Views, which was actually built on this project called Param.

13:52 So Param is kind of like, we now have data classes in Python, kind of parameters on classes,

13:59 that are typed and so on, and there's projects like Patelits.

14:03 So Param was kind of the foundation of everything.

14:05 Built on top of that to kind of have type validation, just general semantic validation as well.

14:12 This thing is not just a tuple, but it's a range of two numbers.

14:16 It actually represents a range.

14:18 That was kind of the initial thing, which had been around before I even got into Python.

14:23 And then we built Olive Views on top of that.

14:25 And then one of our first projects at Continuum, back in the Continuum days, was for the UK

14:32 med office to build, kind of extend Olive Views to have geographic support, brought about

14:38 GeoViews, which is kind of just an extension for Olive Views.

14:40 And then...

14:41 Right.

14:41 Obviously focused on, you know, geographical data and map data and whatnot.

14:45 Yes.

14:46 Yeah.

14:46 Handling the projections for you and stuff like that.

14:48 That's just quite nice.

14:50 But then, I mean, what we saw over and over again as part of our consulting project was

14:54 people were happy to have these analysis and a lot of them were in notebooks.

14:58 And then people would share these notebooks.

15:00 But really, someone who doesn't know about code is kind of scared or put off by all the

15:06 code in this notebook.

15:07 And they want another way to share it.

15:09 And that's kind of how we started building dashboarding tools.

15:12 Right.

15:12 So notebooks are pretty nice to show people.

15:15 But at least in Jupyter, as far as I know, there's not a great way to say, please load

15:19 this with every bit of code collapsed.

15:21 Right.

15:22 So, I mean, there's templates, but maybe they're kind of obscure.

15:24 Not everyone is familiar with them.

15:26 And just generally, like, if you just want to have everything nicely presented as a nice

15:31 layout that you put together.

15:32 There wasn't really tools for that.

15:34 Yeah.

15:35 Right.

15:35 Okay.

15:36 But all of that has changed.

15:37 I'll get into that a little bit.

15:39 And then we just needed a name for all of this stuff, which we decided on Pyviz.

15:44 Pyviz seemed like a good name.

15:45 It wasn't taken.

15:45 And then it was a great name.

15:48 But we had a little bit of pushback from the community.

15:50 Pyviz sounds like just Python visualization, right?

15:53 It's kind of presumptuous to think that it's a name we could keep.

15:57 Like, you can't claim it all.

15:59 Right.

16:00 And I think that was...

16:01 We've got Bogey.

16:02 We've got, yeah.

16:03 Yeah.

16:03 It was a totally fair criticism.

16:05 And we kind of talked to various community members and were like, okay, Pyviz becomes this

16:09 general thing and we're going to find a new name, which has been confusing.

16:12 Like, obviously, we haven't...

16:14 I think this was a year and a half ago.

16:16 And we kind of run with the Pyviz name for a year and a half as well.

16:19 And so oftentimes when you see a blog post out there now, it's still the Pyviz is us,

16:25 not the general resource that it's meant to be.

16:27 Yeah.

16:28 I was looking at some videos on YouTube about some of the presentations you gave and stuff.

16:32 And I saw sometimes it was called HoloViz and sometimes it was Pyviz.

16:35 And I'm like, oh, I wonder what the relationship is here.

16:37 And I see historically where that comes from.

16:39 Yeah.

16:39 So I think overall, it was a good idea to kind of have Pyviz.org become this general resource.

16:44 And we absolutely, we're happy to kind of have this listing of all the different visualization

16:49 dashboarding libraries on there.

16:50 And we'd like to have more tutorial material to point to and stuff like that.

16:54 So just it becomes a general resource.

16:56 And HoloViz is now our effort to kind of have a coordinate and we need a set of tools that

17:01 work well together to have browser-based visualization, dashboarding, and just make that easier and make

17:07 it all fit together.

17:08 Yeah.

17:08 Very cool.

17:09 You're basically making a bunch of choices for people.

17:12 Like, here's a way that you can plot stuff.

17:14 Here's a way that you can stream data or process large amounts of data with what doesn't fit in

17:19 RAM or whatever.

17:19 Yeah.

17:20 But you're not forcing them down that path, right?

17:22 One of the things that was cool is like, okay, if you need more control or you want to do

17:26 something different, you can just do this yourself.

17:27 If you want to not do less work and accept the defaults or the default way of working, right,

17:32 you can use something built in.

17:34 That's exactly right.

17:35 Yeah.

17:35 So the idea that there is, the way we try to communicate that is it's about shortcuts,

17:40 not good ends, right?

17:40 To make it easy, the default should be good.

17:42 You should just be able to get something on screen quickly, which is kind of a philosophy

17:46 that you could use in particular.

17:48 You just wrap your data and it visualizes itself.

17:51 But then you shouldn't be stuck there.

17:53 It shouldn't be.

17:53 There's plenty of libraries where you just put, get your quick plot, but then it's really

17:57 hard to customize from there.

17:58 And that's something we had to learn as well, I think.

18:01 So all of these is pretty opinionated, actually.

18:03 And it doesn't fit the regular model that people are thinking of, right?

18:07 It's imperative plotting model where you say, you get your figure, get your axes, you kind

18:12 of modify each little bit of the axes.

18:13 It's about just wrapping your data and have it visualize itself.

18:16 And then you can tweak the options on it.

18:19 And that model doesn't work for everyone, which is something we had to learn.

18:23 We've kind of decided now that we'd rather meet people where they are, right?

18:26 People already, there's already such a big ecosystem of visualization tools.

18:31 And at this point, even a big ecosystem of dashboarding tools.

18:35 And rather than tell people, like, this is the way you have to do it, you should just be

18:39 able to plug in what you have and go from there.

18:42 And that's the Ada philosophy behind HV plot, which is kind of a wrapper around all of you,

18:47 which just, you use Pandas, you'll know that Pandas has a .plot API, which just takes your

18:54 Pandas data and you tell it a little bit of, like, this goes on the x-axis, this goes on

18:58 the y-axis.

18:58 I want to color by this variable.

19:01 And then it gives you the plot.

19:02 And we wanted to take that and kind of say, well, this works well for Pandas.

19:05 We want it to work for the entire pay data.

19:08 Right.

19:08 HV plot is meant to work not just with Pandas, but with Dask, X-Ray, NetworkX, Geopandas.

19:14 And the most recent addition there is QDFs.

19:17 So GPU data.

19:19 Yeah.

19:19 I haven't heard of QDF.

19:20 That sounds really cool.

19:21 But, like, Dask is very powerful.

19:23 And I had Matthew Rockland on the show to talk about that.

19:26 And it's like data.

19:28 That's what's important.

19:29 It's like Pandas.

19:29 Yeah.

19:30 It's like Pandas, but across multiple machines if necessary.

19:33 Sort of thing that has the same.

19:36 If necessary.

19:36 Or across multiple cores locally.

19:38 Yeah.

19:38 Lots of you.

19:40 Yeah.

19:40 Scaling up and out.

19:41 Yeah.

19:41 It's super cool.

19:42 And that's super nice, particularly because we have this data shader project.

19:45 It basically takes your data and you can describe it as like a fancy 2D histogram, right?

19:50 You get some basically heat map out.

19:53 But it does this really fast.

19:54 It's built on Numba and Dask.

19:56 So you can generate basically images from data points really quickly.

20:02 It doesn't just support point data.

20:04 It supports polygons.

20:05 It supports lines.

20:05 It supports re-grading of rasters and quad meshes and tri-meshes.

20:09 And basically it just takes your data, renders it with Numba and Dask.

20:12 Just put something on screen really, really quickly.

20:15 And the idea there is just to have fast and accurate rendering of large data sets.

20:20 And when we're talking large, that means millions or billions of dollars.

20:23 Yeah.

20:23 That's cool.

20:24 I think it was on your project where you've got a picture of, was it the US?

20:28 Yes.

20:29 Plotting every single person and where they were.

20:31 Right.

20:31 300 million data points.

20:32 Super quickly?

20:33 Yeah.

20:34 300 million data points a second or less.

20:36 And interactively zoom in and out.

20:38 I think particularly now that we have the app support, which basically NVIDIA has this new

20:43 initiative called Rapids, where they're rebuilding the PyData ecosystem on top of GPUs.

20:50 That's crazy.

20:51 It is crazy, but you couldn't always use.

20:53 Crazy awesome, yeah.

20:53 Yeah.

20:54 Like, yeah, exactly.

20:55 You used to have to write these CUDA kernels yourself, and it was really obscure.

20:59 But now it's, you just use your GPU and it just works.

21:03 So they had initial prototype for Data Shader, and our collaborator of ours called John Meese,

21:10 who normally is chief scientist at Botley, and took that and extended Data Shader to support

21:17 GPUs natively.

21:17 So now you can, like, it takes maybe 10, 20 milliseconds to aggregate these 300 million

21:23 data points.

21:23 Wow.

21:24 It's just incredible.

21:25 So cool.

21:25 And in theory, it's also scaled across multiple GPUs.

21:28 I haven't been able to try that.

21:30 Yeah.

21:30 I don't have access to that.

21:31 It never ceases to amazes me how powerful GPUs are.

21:35 Right.

21:36 Yes.

21:36 Every time I think, wow, it can do that, that's amazing.

21:39 And then it's like, nope, it could do more.

21:41 Exactly.

21:42 And yeah, it's kind of, for us, it's coming full circle, right?

21:47 GPUs, G and GPUs stands for graphical.

21:49 Yeah.

21:50 If you actually wanted to use it for graphical stuff or visualization stuff, there is basically,

21:54 well, you have to write your own kernels.

21:56 It would be super painful, but now it's possible.

22:00 All the hard work that you reference for actual people.

22:02 That's super cool.

22:03 Very, very cool.

22:04 All right.

22:04 So we've got hollow views and then related to that, the library, geo views.

22:09 You talked about HV plot.

22:11 Yes.

22:11 Talked about data shader.

22:12 This is what we were just talking about, quickly rendering like 200 million points on a map.

22:16 Talked about param as the basis for like the data class like functionality.

22:20 We also have color set.

22:22 Yes.

22:22 Color set is just, we all know that color maps can be crucial.

22:27 I don't know if everyone knows this, but I think the most famous example is the jets or rainbow color map.

22:32 That's been so divided for a good reason.

22:35 Basically, it distorts the data space terribly and you can draw all kinds of false conclusions just because in color space it can really distort things.

22:44 So there's actually scary stuff about potentially like doctors may have drawn false conclusions just by looking at the color maps.

22:51 That's jets in risk interpreting jets, which is easy to do.

22:55 So we created this package called color sets, which basically just has a set of, it's actually uniform color maps.

23:02 It actually takes work, I should have read up on this, but basically we took a set of color maps that someone had published a paper about and kind of published, draft those for the Python ecosystem.

23:12 And that became color set.

23:13 So please look at the website to look up the names.

23:16 I feel bad for not properly crediting the person here.

23:19 It's really handy to have that put together and well thought through and yeah.

23:23 Choosing colors, one, that look good and two, that are meaningful and not so easy.

23:27 Yes, it's not easy.

23:28 But I mean, thankfully the community has become really aware of this and there's a few packages like this now.

23:33 So it's just good to see that happen.

23:35 Particularly with like, I think the turning point was when the default column had not been not published, it's changed to very good.

23:41 I think at one point it actually was jet, wasn't it?

23:43 The final star of the show here is panel, which you talked about being that six to nine month project that you've got to work on new dashboarding tools.

23:51 Yeah.

23:52 So the Python ecosystem for a long time, R had Shiny and Shiny was great.

23:57 It's super cool.

23:58 It makes it easy to kind of share your analysis in R.

24:01 And Python didn't have this, right?

24:02 It had, there was, early on there was like this Jupyter dashboard project where you could take a Jupyter notebook and kind of arrange the cells a little bit and get a layout.

24:11 And we used that for a little while, but then it was abandoned.

24:13 But this was a problem that we kept coming back to.

24:16 People wanted to share their analyses without, as an actual dashboard or just a little app.

24:21 And so we decided to build panel.

24:25 Just before then actually, Plotly came out with this project called Dash, which is also a really nice, nice library to build dashboards in Python.

24:33 It requires a little bit more knowledge of CSS and JavaScript in certain cases.

24:37 And we wanted something where people could just drop in their analysis, their existing analysis.

24:42 You could drop it in.

24:43 You could wrap in a function and then annotate that, like, this function depends on these things.

24:48 And so when those things change, it updates kind of a reactive model.

24:52 And we just wanted a library where you drop in your existing analysis.

24:55 You've got some notebook.

24:56 You want to share it and reduce the friction of that, right?

24:59 That's what we kept seeing over and over again in different organizations was the fact that they had a bunch of data scientists.

25:04 And then they had a visualization team.

25:06 And the data scientists would produce some analysis.

25:08 And then they have to hand it over to the visualization team.

25:11 It didn't necessarily work in Python, right?

25:13 It could be.

25:13 We had some custom JavaScript framework or, yeah, there's just friction in making that transition from, let's say, Python analysis.

25:21 To shareable dashboard.

25:23 That process needed to become easier.

25:26 Right.

25:26 And that's how a panel emerged.

25:27 So a panel is a way where, yeah, you can basically lay out different parts of visualization.

25:33 And it's like organizing a notebook with not necessarily showing the code.

25:37 And you get some little sliders and other widgets that you can interact with that drive it.

25:41 And to me, it feels like just a really nice way to quickly take what you were already doing in exploratory mode and put it up in a user interactive way.

25:51 Without learning Flask, APIs, JavaScript.

25:53 This portion of Talk Python To Me is brought to you by Datadog.

25:59 Are you having trouble visualizing bottlenecks and latency in your apps?

26:02 And you're not sure where the issue is coming from or how to solve it?

26:05 With Datadog's end-to-end monitoring platform, you can use their customizable built-in dashboard to collect metrics and visualize app performance in real time.

26:15 Datadog automatically correlates logs and traces at the level of individual requests, allowing you to quickly troubleshoot your Python application.

26:22 Plus, their service map automatically plots the flow of requests across your application architecture so you understand dependencies and can proactively monitor the performance of your apps.

26:33 Be the hero that got your app back on track at your company.

26:36 Get started today with a free trial at talkpython.fm/Datadog.

26:43 Precisely.

26:43 So, yeah, the idea is that, yeah, you just have your announcers, you drop them into this thing, you put it in a bunch of rails and columns, lay it out on your screen, and then you put one little command at the end of this thing.

26:53 You build the layout you have to build called servable.

26:56 And then you can run panels, serve the notebook, and it just pops up with your dashboard.

27:00 Nice.

27:01 How do you host it?

27:02 So, actually, it's just built on Bokeh, so it's just a Tornado server.

27:05 So you can just host it on any of the cloud server providers.

27:08 We are trying to kind of build out our documentation to make that a really simple process.

27:13 Or even thinking about having a command to say, I'm going to deploy this to AWS or to Google Cloud or whatever.

27:20 Right, right.

27:21 But, yeah, that's still areas we're actually working on.

27:23 But in the end, it's just...

27:24 What about like a container?

27:25 Oh, absolutely.

27:26 Oh, yes.

27:26 Yeah.

27:27 That's...

27:28 Yeah.

27:28 For our examples, we have...

27:30 We build on this tool called Anaconda Project.

27:33 It wraps a conda environment with some commands, and then it deploys it.

27:38 And what we're hoping for is that...

27:41 I think there's a PR, or maybe it's already merged, where you basically just give it this project file,

27:45 which is just a YAML file with the environment and these commands that it runs.

27:49 And then it builds a Docker container for you.

27:51 So I think that that's a really nice route to go.

27:53 Just to kind of contain everything, your entire environment, and the commands you need to run, and then you just get it.

27:59 Yeah.

28:00 Maybe even go to like some kind of hosted Kubernetes cluster service.

28:04 Just go take this, run it there, make sure it keeps running, upgrade it for me if I need it.

28:08 Yes.

28:09 But we're certainly looking for...

28:10 So if anyone's interested in helping us out with our friends, we're always looking for contributors.

28:14 Yeah.

28:15 Sure.

28:15 But otherwise, yeah, we're also working on including documentation.

28:18 Yeah.

28:19 It's a bit of an orthogonal problem and skill set to solving visualization, right?

28:23 Like, it's one thing exactly through the JavaScript.

28:25 Cool visualization is another to like, well, now I do DevOps too.

28:28 Exactly right, yeah.

28:30 Although it's something, I mean, it's a good skill to learn, so I'm happy to dive into it a little bit.

28:34 Yeah.

28:35 Yeah, absolutely.

28:36 So on your website, I think it's into the getting started section, so at holoviz, H-O-L-O-V-I-Z.org,

28:44 you've got a document that sort of talks about, given these different scenarios, more of a picture, I guess,

28:50 answer a couple of questions about it, and we'll help you choose the subset of tools to kind of flow this together, right?

28:58 Right.

28:58 You want to maybe talk us through some of these scenarios?

29:01 So it says, are you working with tabular data, working with like other types of arrays, network,

29:06 n-dimensional, or streaming data?

29:08 And then there's like sort of this flow of like, all right, here's how you piece together these tools

29:13 to come up with something excellent.

29:14 Yes.

29:15 So we call this the mermaid diagram.

29:16 And yeah, if you look at holoviz.org, you'll find it there.

29:19 I actually don't know why it's called the mermaid diagram, but there we go.

29:22 But yeah, the general idea is that it takes you from the type of data, as we said.

29:27 So let's say you've got some tabular data, and now you need to decide, like, what library should I choose to load this data?

29:33 So you might, below a certain threshold, pandas is totally fine.

29:37 And if you want to go to geographic data, use geopandas, or actually geometries.

29:42 Geographic geometries, use geopandas.

29:44 Right.

29:45 And just so people know, yeah, and you're cut off here, you say, do you have more than 50,000 rows?

29:49 I mean, obviously, it varies a little bit on the computer you have.

29:52 Yeah, that's pretty arbitrary.

29:54 Yeah, exactly.

29:55 But it gives you a kind of a sense.

29:57 It's not millions of rows or billions or something like that, right?

30:02 It's not that high of a number to say, okay, well, maybe you want to consider something other than pandas for working on some of this.

30:08 But yeah, okay.

30:09 Is it a huge amount of data?

30:10 By some definition, a huge.

30:12 And if not, is it geospatial?

30:14 Precisely.

30:14 And then you might use geopandas.

30:16 Otherwise, you may use Dask, for example, instead.

30:19 So Dask is a really great tool.

30:21 We've already mentioned time to load.

30:23 You've got millions or billions of rows, and you just want to, you can't load it all into memory.

30:29 You just don't have the space.

30:30 And you want to build a Dask.

30:31 And then the whole point behind our ecosystem, particularly HBplot, is you shouldn't have to change any code.

30:38 Whether you choose pandas or Dask or now the QDF library, you shouldn't have to change any code.

30:45 You should just be able to dump your data into this framework and then call .heplot on it, and then you get your plot out.

30:52 So that's kind of the philosophy here.

30:54 The same applies to, like, you've got some n-dimensional arrays.

30:58 We generally recommend, for example, that you go with X-Array.

31:00 So X-Array is really underrated as a library.

31:03 It should be more popular.

31:05 I don't know, maybe not that many people have, like, n-dimensional arrays, but it's kind of pandas for n-dimensional arrays.

31:12 Yeah, for beyond tabular.

31:14 Beyond tabular, exactly, yeah.

31:15 So you've got, I don't know, you've got satellite imagery over time, right, or satellite.

31:20 Right.

31:20 Or microscope data also over time, like, or Z-Stack over time, like, four-dimensional, five-dimensional, whatever.

31:26 Okay.

31:27 For that, it's just really nice to explore it that way.

31:29 So for that kind of data, you might use that instead.

31:32 Or you might keep it simple and just use NumPy or DaskArrays, which kind of is a lower local.

31:37 But in the end, the idea is that, yeah, in any case, you just drop it into HBplot with the .plot call, and then you get this whole of these objects out of it.

31:46 And this whole of these objects will already display itself, so you already have an interactive plot.

31:50 But then you might have the issue, like, yeah, this was a lot of data.

31:54 We used a DaskDataFrame for a reason.

31:56 Right.

31:57 Or a DaskArray.

31:57 So you may not want to dump it straight into your browser.

32:00 Jumping a gigabyte to your browser is a surefire way to crash it.

32:03 Even with the speed to download it quickly, you know, that much JavaScript is going to make it hurt.

32:08 Yeah.

32:09 It's really going to make it hurt, yeah.

32:11 So that's, like, then you just have the option to HBplot to say, data shape this instead.

32:17 And so that means that you've got server-side aggregation to kind of aggregate this data as you're zooming around.

32:22 And what you get out is a nice interactive Bokeh plot using HBplot.

32:27 Or if you kind of want to do a bit more customization, HBplot actually doesn't directly support Bottley or Mouthable output, but with a little bit of it, you can get there, too.

32:36 And then once you're there, you can save it, you can share your notebook, or you can dump it into panel and do it.

32:42 Right, right.

32:43 Turn it into a dashboard.

32:44 Okay.

32:45 Yeah, very cool.

32:45 I'll link to this flowchart for people over so they can think about it when they're checking this out.

32:51 And I think it also gives you a sense of, even outside of HoloViz, there's useful stuff.

32:57 You're like, do I have streaming data?

32:58 Maybe check out StreamZ or N-Dimensional, like you said, check out X-Ray and Dask.

33:04 And just this idea of, like, how do I think about the right underlying library rather than trying to jam everything into Pandas or NumPy or something.

33:11 Right, yes.

33:11 Yeah, a lot of people, I still see, like, 2D in Pandas and really, yeah, try to switch to X-Ray.

33:17 It's a great library.

33:18 And actually, Pandas used to have this N-Dimensional data structure called a panel, and they've actually deprecated that saying, yes, please just use X-Ray.

33:26 That's what it's made for.

33:27 I see.

33:28 Okay.

33:28 Interesting.

33:29 I didn't realize that.

33:30 When you talked about visualizing millions and millions of data points quickly in the browser, you said, okay, we're going to use Data Shader.

33:37 And I don't know that we necessarily dove into it enough to say exactly what it does.

33:41 So basically, let me see if I get it right how it works.

33:44 It can look at millions, hundreds of millions of data points and say, well, the size of the graph is really this.

33:51 And if you look at these 10,000 at this scale, that's kind of going to be the same.

33:56 Is that how it works?

33:57 Or is it, like, does it downsample somehow?

33:59 Or how does it actually make meaningful pictures or a process of that?

34:04 That's the cool thing about it.

34:05 So it actually is that fast, right?

34:07 It actually, it always looks at all your data.

34:09 Okay.

34:09 But if you're zoomed into something, obviously it won't.

34:11 If it's out of your report.

34:13 Yeah, yeah, yeah.

34:13 If it zooms in, like, it has the clipping outside the rectangle.

34:16 But if you're zoomed out, it actually does iterate over your entire building data points and aggregates them within 50, 100 milliseconds.

34:22 Does that happen on the server, right?

34:24 Exactly.

34:24 It doesn't have the browser, right?

34:25 Exactly.

34:25 That happens on the server.

34:26 And then all it has to do is send the image of the aggregated data, which, or if you have 10 million points or a billion points,

34:33 then a thousand by thousand pixel image is going to be much, much smaller than the actual whole data set.

34:39 Yeah, yeah.

34:39 Absolutely.

34:40 Yeah.

34:40 That's basically the same size, no matter how much data you have.

34:45 Exactly.

34:46 Yes.

34:46 Compression a little bit, but not so much.

34:48 Yeah.

34:48 Exactly.

34:49 So, yeah, you get a fixed size image out of it.

34:50 And that works with most visual elements.

34:54 We've kind of been expanding to visual elements.

34:56 When DataShare was first created, I think it aggregated point data and line data.

35:00 But now it kind of expanded to cover polygons, tri-meshes, quad-meshes, just downsampling images as you're zooming.

35:07 So if you've got a huge, like, one gigapixel image, you can dump it into DataShader.

35:12 It'll downsample it dynamically.

35:14 So you can zoom in panel rounds and export.

35:16 Yeah, that's great.

35:17 And looking at it, it says that it has scalability with Dask or QDF.

35:22 Mm-hmm.

35:22 QDF.

35:22 How do you configure it to make it choose one or the other on the server?

35:26 If you have a Dask data frame, it'll just take that.

35:29 It'll take that.

35:29 And basically, the way a Dask data frame works is basically you should think of it as a bunch of chunks of underlying Pandas data frames, right?

35:36 And these chunks might be distributed across, like, it might be on your machine, or it might be across, like, a whole cluster of machines.

35:43 And so it just keeps the computation local.

35:45 It means that the aggregation for each of those chunks happens on that particular node in your cluster.

35:51 And then once it's done the aggregation, it just has to send the fixed-size image back to the main node to aggregate that.

35:58 And so that way, you can distribute the computation, but still have it available for virtualization.

36:04 Okay, so what you provide to the DataShader basically tells it how to process it.

36:09 Do you give it a Pandas data frame?

36:10 Do you give it a Das data frame?

36:11 Do you give it a QDF frame?

36:13 And it just goes, and it just knows how to work with all of them, but that implies how it's computed.

36:18 Yes, so those three in particular are, yeah.

36:21 Got it, okay, now I see.

36:22 So we actually, on DataShader.org, if you look at, I think, one of the user guides or sorting guides,

36:28 there's a nice handy little table to show you, like, for this data type, like, for point data, these data backends are supported.

36:36 And for points, for example, it might be Pandas, Dask, and Pudia.

36:39 But for an image, it might be X-Ray only, because it doesn't really make sense to show your image into Pandas.

36:47 Yeah, exactly.

36:48 Cool.

36:49 Well, it seems like a really nice way to fit all these things together, and just such a great API.

36:55 Maybe we could talk about some of the projects or communities that are using all of this.

37:00 Oh, absolutely.

37:00 One ecosystem that's really taken up these goals, particularly HVPot and DataShader, is the Pangeo initiative,

37:08 which is basically, it's an initiative by various geoscience folks to build, like, this big data platform to analyze data in the cloud, right?

37:17 So you used to have all these different data silos where, like, you have this data in the cloud, you have to download it onto your local machine and explore it.

37:25 And they've been building, basically, a platform so you can easily deploy your own JupyterHub and then keep your data in the cloud, but analyze it also in the cloud, right?

37:35 And so for these people, so there might be climatologists, there's a lot of climatologists or oceanographers.

37:41 Can I say that word?

37:42 Oceanographers.

37:43 Oceanographers, that's right.

37:44 And they have these huge data sets, right?

37:46 They have huge meshes of data, and they need to do them.

37:50 But they weren't really cool to do that.

37:52 And thanks to all these open-source tools like Dask, you can load the data from the cloud, aggregate it using DataShader, and kind of zoom and pan interactively around without having to kind of download the data.

38:03 So Pangeo is a really great use of it and makes me super happy to see climatologists actually use these tools.

38:11 Yeah, solving some real problems.

38:13 That's awesome.

38:13 Also, Intake.

38:14 What's that?

38:15 Intake.

38:16 Oh, that's a really interesting project.

38:17 So again, actually, this is also leveraged by the Pantheon community.

38:22 So Intake is a project.

38:23 If you've ever had to, like, you have a bunch of data sources, and you have data catalogs, and if you want to keep track of all your data, you want to not have custom scripts to load this kind of data and that kind of data, Intake lets you write one catalog data catalog, which is really just a YAML file to specify.

38:39 I've got the CSV files here, and there's 10,000 of them, and I'll load them somehow.

38:45 And then I've got this netcdf file there, and I've got a bunch of parquet files here.

38:49 You can kind of encapsulate it all in this nice little data catalog, load it, and then I explore it.

38:56 And so in the notebook, literally, you just point it to your catalog and say, load this, and thanks to the specification in that YAML file, it'll just load it.

39:04 And that has integration with our tools in a number of ways.

39:09 So first of all, you can even put your HB plot spec into your data catalog.

39:14 So you can say, I have this default plot that I always look at for this data, right?

39:19 So you can say, in this data source, plot the X column, point at the longitudes in this CSV file, longitudes and latitudes, and then it'll automatically generate your plot.

39:30 You kind of pre-declared your plots in this catalog file.

39:34 I see.

39:35 Okay.

39:35 Yeah, that's super cool.

39:36 Another thing it has, it has a little GUI built on top of a panel, which kind of lets you explore the data, right?

39:42 You've already got your data, and now you can just, by a graphical interface, kind of click around and say, I want to plot this column against this column and color it by that.

39:51 Cool.

39:51 So you specify how to import it and whatnot, and then visualize it.

39:55 That's great.

39:55 Yeah.

39:55 And then what's QCrossFilter?

39:57 So I only recently found out about this.

40:00 So NVIDIA, as I said, has this Rapids initiative, and they've been playing around with visualization.

40:06 in various forms.

40:07 And interestingly, they built this QCrossFilter library, which kind of builds on top of panel and bokeh to build cross-filturing applications.

40:17 Kind of, okay, just cross-filturing.

40:19 It's also referred to as loop brushing.

40:21 So you select something on one plot, and kind of you see that reflecting on other plots.

40:25 And that's built on panel, and basically lets you build dashboards, like cross-filturing stuff.

40:32 They're easily using GPU.

40:33 Super cool.

40:35 And then some space stuff as well.

40:37 These are actually projects that there's consulting clients we've worked with.

40:41 So one of them is the LSST telescope.

40:44 I think it's recently actually got a real name, which is the Vera Rubin telescope.

40:47 It's one of the largest optical telescope in the world.

40:50 And they basically approached me saying they wanted to do some Q&A on the data that comes out of this telescope.

40:57 Right?

40:57 This telescope.

40:59 And there's huge amounts of data.

41:01 There's terabytes a day.

41:02 It can be challenging because they have 50 petabytes of images and data they're going to be collecting and stuff.

41:09 And yeah, it's challenging to go through.

41:11 Yes.

41:11 Particularly, yeah.

41:12 If you have to do the same analysis every day, then you can have a tool that can handle this stuff.

41:17 And so I handed over this project to QuantSite.

41:20 So our famous Travis Oliphant, who is our former CEO, went off and started this other consulting firm called QuantSite.

41:28 And I handed over this project to them.

41:30 And they've been kind of a bit of help for me, but not very much.

41:33 They've built this really nice dashboard to kind of do the Q&A stuff for them.

41:37 And yeah, easily it will handle 500 petabytes.

41:40 But maybe it'll need some more clicking to quite get there.

41:43 Yeah.

41:43 But still, that's pretty impressive.

41:45 That's awesome.

41:45 And it's cool to see all of these projects using the libraries you guys put together.

41:49 They're probably giving it some pretty serious testing.

41:52 Oh, absolutely.

41:53 Testing and standard data science.

41:55 Yeah.

41:55 And they'll absolutely find all the little performance issues.

41:58 And yeah, it's great.

42:00 That's actually the best part of my job.

42:02 I don't just building open-source tools is awesome.

42:04 But really, if you're completely divorced from the actual users of those libraries, it's really hard to tell.

42:10 Are you doing the right thing or are you just wasting your time?

42:13 So it's super nice to go back and forth between actual consulting where you see people's problems and then going back to the tool and kind of improving it.

42:20 Absolutely.

42:20 Yeah, you need this blend to keep it real.

42:22 But if you're too focused on just solving problems for consulting, you don't get that spare time to develop new stuff as much.

42:30 Yeah.

42:31 I want to close the conversation out with two quick things.

42:33 One, I really like to explore things like this, like visual play analysis type of libraries and whatnot by just looking at some examples because usually you can look at some nice pictures and get a quick sense.

42:45 So you guys have a bunch of different tutorials or simple little examples, expositions, showing them off.

42:52 Maybe is there like a couple of favorites you just want to point people out and tell them what it does?

42:56 Oh, absolutely.

42:56 Yeah.

42:57 I don't know how best to do this.

42:58 So what we can do is, so most of our websites, particularly for like hollow views and geoviews and panel, we have a little gallery.

43:05 So maybe we can look at some of those.

43:07 But then we also have this website, examples.privates.org, which really we want contributing to the content of these articles.

43:14 Again, privates can not be this general thing.

43:17 So if people are interested in contributing their own examples.

43:19 Yeah.

43:20 Just grab a couple that you think are like really cool and show off stuff people would appreciate.

43:24 Yeah.

43:24 So let's go with the one that you talked about earlier, the census example.

43:28 So if you go to examples.privates.org.

43:30 Just search for census.

43:31 It'll be right on the main page, right?

43:33 There's like a little gallery.

43:35 And then if you click on that, there's at the top right, there's the census example, which kind of explores how do I use.

43:41 Yeah.

43:42 I've got this census dataset.

43:43 How do I actually display it?

43:45 It's like start exploring it.

43:46 So it starts off by kind of loading the dataset.

43:48 We just load it in this case with a library called, well, and in fact, it's just loaded with Dask.

43:54 But we've got this little wrapper around Dask, which does spatial indexing.

43:57 So spatial indexing means that it has built an index of the space.

44:01 It's with the term, it's an archery.

44:04 It's super fast to say, show me the things that are near here or are right here.

44:08 Precisely.

44:09 Yes.

44:09 So if you're zooming in and no longer, as I talked about earlier, by default, DataShader has to scan through the entire dataset each time, even if you're zoomed in to just a little spot.

44:19 With spatial indexing, you can say, okay, this stuff is definitely not in my viewport.

44:23 I don't need to consider it.

44:24 And so it becomes faster as you zoom in.

44:26 Right.

44:26 If you look at that notebook, you kind of start by loading the dataset, data shading it, and we start with just simple linear aggregation.

44:34 And what you'll immediately notice is that it's just black.

44:38 If you think about the population in the US, there's a few hotspots, but New York is super dense.

44:43 All the cities are kind of dense, but New York is particularly dense.

44:46 And so all you see with linear color mapping is basically New York and then a few kind of blurry cities.

44:52 A little bit of LA, a little bit of Chicago.

44:54 That's about it.

44:55 Yeah.

44:55 And so the nice thing about DataShader is that it kind of takes away that you can do linear, you can do log, or you can kind of adjust the color map.

45:04 And that's all kind of difficult.

45:05 But by default, DataShader actually does something called EQHist, which is histogram equalization, which means that it kind of adjusts the histogram of the color map in such a way that the lowest number of...

45:18 It's hard to explain.

45:19 But it kind of equalizes the color map in such a way that actually...

45:22 It's a picture over audio only.

45:24 Yeah, it's a hard one.

45:25 But basically it reveals the shape of your data, if not the exact values.

45:29 So you shouldn't use it for reading out the exact values of something, but to get an overall idea of something, it's a really nice mechanism.

45:37 And that kind of is part of what makes like...

45:39 If you look at the DataShader image, or you like...

45:42 Sometimes on Twitter, I'll see like an image, and it just pops.

45:45 And that's the EQHistogramming color section.

45:49 Makes sure that you see the overall shape of your data, not the exact values.

45:53 And this kind of goes through that example, goes through that, and kind of explains what does this actually do.

46:00 So in the census data, you can see the shape of each city now.

46:03 You can see kind of a lot of the non-inotonous area in the west of the U.S.

46:08 It's kind of empty.

46:09 And yeah, it really reveals like the population distribution in the U.S.

46:13 And then it also kind of demonstrates how to manipulate your color map to kind of show the hot spots, especially.

46:20 So you can kind of...

46:22 Because you've now aggregated into a fixed-dice image, you can say values above this density, you call it in red.

46:29 And so you really get the cities to pop out.

46:31 Yeah, very cool.

46:32 And then maybe I won't go into too much detail on this one anymore, but do check it out.

46:36 Sure, sure.

46:37 But it really builds up.

46:38 People can check it out.

46:39 I'll link to it.

46:39 And each one of these steps like builds up with just like a line or two of code.

46:43 It's not super complicated.

46:44 Exactly right, yeah.

46:45 And then we kind of explore kind of depressing facts.

46:48 So this dataset has basically the race of all the individual people.

46:52 And you can really see the segregation in different cities.

46:55 It's horrible to think about.

46:57 Yeah, that's not good.

46:58 Yeah, it really reveals the fact if you look at it.

47:01 Wow.

47:01 Yeah, I see.

47:02 And then finally, it rounds out with like showing you how to...

47:05 Because our tools are meant to work well together, the final example kind of demonstrates how

47:09 to take this data and use HoloViews to generate an interactive plot where if it was running on

47:16 a server, so on the website, you're not going to be able to zoom very much, kind of gets very pixelated.

47:21 But if you're running it on your own in a notebook yourself, or you deploy it to a server panel, for example, then you can zoom around and pan and zoom into individual people.

47:31 Yeah, wow.

47:32 Yeah.

47:32 Wow.

47:32 That's wild.

47:32 Yeah.

47:33 So maybe there's a whole bunch of examples over at examples.pyviz.org.

47:37 And each one of the examples is tagged with the various libraries that people might want to explore.

47:41 So there's a bunch here.

47:42 People can just go there and dig into them and check it out.

47:45 And this is just one of them.

47:46 Yeah.

47:46 And we're trying to keep those up to date.

47:48 And if you build a cool thing, please submit it.

47:50 Yeah.

47:50 Awesome.

47:50 All right.

47:51 I guess maybe you could just touch really quick on awesome-panel.org and then tell us what's next with the whole project and probably be out of time then.

47:59 I've been super happy to see.

48:00 So community building is hard.

48:02 And I think many open source developers realize this.

48:05 And we're definitely still learning.

48:07 But there's been quite a lot of interest in panel.

48:10 And Mark Skroth-Matson, who works in Denmark, built this website called awesomepanel.org,

48:18 which really kind of tries to show off what you can build with panel.

48:21 So our examples kind of try to focus on the simpler stuff.

48:24 But awesomepanel really shows you what you can do.

48:26 And it's really impressive what he's done with panel on awesomepanel.org.

48:31 Has lots of resources for how to best leverage things.

48:35 And a lot of stuff, ideally, would migrate back to our website.

48:38 But yeah, he's built this complex multi-page site, which takes you through a lot of different ways of using panel and has a lot of examples.

48:47 Yeah.

48:47 Cool.

48:47 It's kind of meta, right?

48:48 Like panels involved in it as well?

48:50 Oh, yeah.

48:50 It's built on panel, right?

48:52 A website that's built entirely in panel.

48:54 About panel.

48:54 But also about panel.

48:56 And yeah, I've been trying to take that further as well.

48:58 So recently, I did a talk for our AnacondaCon, which is our conference here at Anaconda.

49:04 And I built the entire presentation.

49:06 I built a presentation tool on top of panel.

49:09 Demo panel.

49:10 Yeah.

49:10 Awesome.

49:11 Very, very cool.

49:12 All right.

49:13 So what's next with the whole project?

49:14 Where are you guys going?

49:15 So one thing that we've kind of been working on recently a lot is in terms of HoloViews is we've added link selections.

49:22 So you can now, in the spirit of shortcuts, not dead ends, you can generate your HoloView plots.

49:27 And if they're all using the same data, you can just say link selections, apply it to these various components.

49:33 And then it will automatically hook up all the linking between the plots so that when you select on one, all the other ones update.

49:39 I'm really excited about that.

49:40 It makes it super easy to build a complex dashboard.

49:43 Yeah, that's super cool.

49:43 Unless you dive into the data.

49:45 Yeah.

49:45 Cool.

49:46 And particularly with the GPU support now, you can build like, yeah.

49:49 With GPU support and data shader, you can now like explore tens of millions or billions of data points using link selections.

49:55 It's really easy.

49:56 It's super cool.

49:57 That's one thing I'm excited about.

49:59 Okay.

49:59 Another thing in terms of panel ecosystem I'm excited about is the next release is going to have default templates.

50:06 And so what that means is panels always have the ability to kind of just put stuff together.

50:10 You say this thing goes in a row with this thing, like a bunch of widgets go in a row or a column.

50:16 And then you put plot here.

50:18 But if you want to have more control, you kind of have to write your own kind of HTML and CSS to lay things out.

50:25 And you have that ability, you do templates.

50:27 But the next release is going to...

50:29 You might not want to exercise it.

50:30 Yeah.

50:31 So yeah, that's what we're trying to keep people from, right?

50:34 We don't want people to have to do that kind of thing.

50:36 So what we've done now is added some default templates where you basically say, I want this to go in the sidebar.

50:42 I want to do this to go in the header.

50:43 I want this to go in the main area.

50:44 And it looks like a polished website.

50:46 It's not just a bunch of things on the webpage.

50:48 It's actually a nice looking thing.

50:51 So I'm also really excited about that because I think that's something that's been missing.

50:55 So we've had a lot of cool little demos, but it's like to build a whole nice polished looking dashboard.

51:01 Yeah, a little more control.

51:02 And that's kind of bridge that gap.

51:04 And then the last thing I'm really excited about in this next release of the panel is integrating the other ecosystems.

51:10 So if you're familiar with Jupyter, you'll know about IPI widgets.

51:15 And IPI widgets has been like lots of lives have now built on top of Jupyter widgets to kind of, I don't know, there's things like IPI volumes, explore 3D volumes.

51:24 There's just a whole bunch of lives, right?

51:26 And it's kind of been a shame that we don't want to have divergent ecosystems, right?

51:33 And so in this next release, we're going to be able to just put your IPI widget into your panel app or even your Bokey app.

51:39 So this has been done at the Bokey level since we've built on Bokey.

51:41 You just dump it in there and it's just feeding on the deployed tornado server or the Bokey server.

51:46 It'll just work and load the feature to your widgets correctly and hook up all the communication for you.

51:52 And so now we don't have these two divergent ecosystems anymore.

51:55 You can just use IPI widgets in panel or you can go the other way and kind of say,

52:01 well, we've got this deployment system and there's this library called Volal, which kind of serves Jupyter Notebooks as well.

52:07 And so you can now put your panel app into Volal and just serve that.

52:11 You just make sure that ecosystems don't diverge and you can use the tools that you want.

52:15 Yeah, that's nice.

52:16 Because you don't want to have to have a separate set of widgets for your visualizations.

52:20 And then people also building them for Jupyter, of course, they would build them for there.

52:25 Right.

52:25 So might as well just bring these together.

52:27 Exactly.

52:27 So it's been a long effort.

52:29 And there's still, I'm sure, some issues to find out.

52:32 But super good to be able to shift that.

52:34 Yeah.

52:35 All right.

52:35 Well, very cool.

52:36 Great project.

52:37 Great examples.

52:38 And it looks like you got a lot of momentum going forward as well.

52:41 So thanks for bringing all of your experience and talking about what you guys have built there.

52:46 Thanks so much for having me.

52:47 It's been really great to talk about this stuff.

52:49 Yeah, you bet.

52:49 Before we get out of here, though, you've got to answer the final two questions.

52:52 If you're going to write some Python code, what editor do you use?

52:54 Oh, I'm still on Emacs.

52:56 My professor is now my boss at Anaconda hooked me and I'm still there.

53:00 Although I do dabble in Gears Code now.

53:02 Yeah, very cool.

53:03 And probably some Jupyter as well at some point.

53:05 Yes.

53:06 I always have like four JupyterLab tabs open and then Emacs on the side.

53:11 Yeah, cool.

53:11 And then notable PyPI package.

53:14 You got a bunch here already.

53:15 It's worth saying you can just pip install HoloViz.

53:18 Oh, there's a bunch.

53:18 But yeah, I think I've already.

53:20 Yeah, you can pip install HoloViz, which is kind of a good tutorial.

53:24 But I really want to take this opportunity to plug some of the underlying libraries.

53:28 I think X-Ray is awesome.

53:30 I've already mentioned this.

53:31 And Dask is awesome.

53:32 So I really want to plug those.

53:33 I agree.

53:34 All right.

53:34 Very cool.

53:35 So people are excited about HoloViz.

53:37 Sounds like it really might solve the problem or help them build some dashboards.

53:41 Final Call to Action.

53:42 What can they do?

53:43 Come visit us at HoloViz.org.

53:45 There's a tutorial that will take you through the initial steps of using our projects and

53:50 finally let you build a whole dashboard.

53:52 So go there, check that out.

53:54 Also, check out examples.pipiz.org to see what you can do if you master this stuff to build

53:59 more complex examples.

54:00 And then message me on Twitter or message our individual projects like HoloViz or panel on

54:06 Twitter.

54:07 And if you've got any longer form questions, join us on our discourse, which is at discourse.holoviz.org.

54:13 Super.

54:13 All right.

54:14 Well, it looks like a great project.

54:16 And I think people build cool things with it.

54:18 So thanks for sharing it with us.

54:19 Awesome.

54:20 Thanks again.

54:20 You bet.

54:20 Bye.

54:22 This has been another episode of Talk Python To Me.

54:24 Our guest on this episode has been Philip Rudiger and it's been brought to you by Brilliant.org

54:29 and Datadog.

54:30 Brilliant.org encourages you to level up your analytical skills and knowledge.

54:35 Visit talkpython.fm/brilliant and get Brilliant Premium to learn something new every

54:41 day.

54:41 Datadog gives you visibility into the whole system running your code.

54:46 Visit talkpython.fm/datadog and see what you've been missing.

54:49 Go throw in a free t-shirt with your free trial.

54:52 Want to level up your Python?

54:54 If you're just getting started, try my Python Jumpstart by Building 10 Apps course.

54:59 Or if you're looking for something more advanced, check out our new async course that digs into

55:04 all the different types of async programming you can do in Python.

55:07 And of course, if you're interested in more than one of these, be sure to check out our

55:11 Everything Bundle.

55:12 It's like a subscription that never expires.

55:14 Be sure to subscribe to the show.

55:16 Open your favorite podcatcher and search for Python.

55:18 We should be right at the top.

55:20 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the

55:25 direct RSS feed at /rss on talkpython.fm.

55:29 This is your host, Michael Kennedy.

55:31 Thanks so much for listening.

55:32 I really appreciate it.

55:33 Now get out there and write some Python code.

55:35 Thank you.

55:37 Thank you.

55:39 Thank you.

55:41 Thank you.

55:42 Bye.

55:43 Bye.

55:44 Bye.

55:45 Bye.

55:46 Bye.

55:47 Bye.

55:48 Bye.

55:49 Bye.

55:50 Bye.

55:51 Bye.

55:52 Thank you.

HoloViz - a suite of tools for Python visualization