Learn Python with Talk Python's 270 hours of courses

#222: Interactive graphs with Bokeh and Python Transcript

Recorded on Wednesday, Jul 24, 2019.

00:00 Do you have data you want to visualize and share? It's easy enough to make a static graph of it,

00:04 but what if you want to zoom in and highlight different sections? What if you need to rerun

00:09 your machine learning model on the selected data? Then you might want to consider working with Bokeh.

00:13 It does this and much more. Join me on this episode where you'll meet Brian Vandeven,

00:18 who heads up the Bokeh project. This is Talk Python to Me, episode 222, recorded July 24th, 2019.

00:25 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries,

00:43 the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter,

00:48 where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm,

00:52 and follow the show on Twitter via at Talk Python.

00:55 This episode is brought to you by Ting and Linode. Please check out what they're offering

00:59 during their segments. It really helps support the show. Brian, welcome to Talk Python.

01:03 Hi, thanks for having me.

01:04 Yeah, it's great to have you here. You know, I've often thought about ways in which I could

01:09 use some of these cool Python visualization libraries, and I haven't recently had some

01:14 great excuses to use them, so I haven't really covered them enough on the show, but I'm really

01:18 excited to talk about Bokeh with you this week.

01:19 Oh, I'm super excited to be here. I think Bokeh has really developed a lot over the last year or so

01:24 in particular, and so this is a great opportunity.

01:26 Yeah, absolutely. Before we get to it, though, let's start with your story. How'd you get into

01:29 programming in Python?

01:30 In Python? So I think the first version of Python I ever used was Python 1.4, actually way,

01:34 way back in the day, and I was doing some system administration kind of job, so there was a lot

01:39 of Perl, but I happened to get into using Python for a few things, and it was a lot of fun. Put it

01:44 down for a while, picked it up here and there, but I've been using it pretty extensively

01:47 probably since about 2005 or 2006.

01:51 Okay, yeah, those are pretty early days, Python 1, right? We don't have that debate about 1 versus

01:56 2 anymore. It's moved on to 2 versus 3.

01:58 Yeah, I don't think there was ever really much debate. Everyone was ready for Python 2 for sure.

02:03 Yeah, absolutely. So how'd you get into programming in the first place?

02:05 Let's see. The first thing I ever did was on a TRS-80 that was actually checked out from our

02:10 local library. They had a program to check out TRS-80s for two weeks, and there was a logo cartridge

02:15 that came with it, so we could do logo programming. A little bit later, we had some Commodore computers,

02:19 and so I did, you know, basic, and I think at one point I even got into like 6502 assembly,

02:23 you know, when I was getting to be a teenager or something, but yeah, you know, just 8-bit

02:28 programming way back in the day.

02:30 Yeah, how interesting. Yeah, that's funny with assembly language, like that's not a super

02:34 easy compare it. Like you've got basic on one side and assembly language on the other. Not

02:38 a whole lot in between, huh?

02:39 Well, there's not a lot of different ways to program on a Commodore 64.

02:44 You had to earn your programming stripes back in the early days, that's for sure.

02:48 Nice. Okay, so Boca is a very visual thing. For a long time, you were at Anaconda Inc.

02:55 So, is there a science background as well that got you sort of in that path, or how do you get

03:01 interested in all of these things? Yeah, I've had a pretty tortured academic path. I went to school

03:05 for computer science, then left for a while, and I worked in some research labs, and I realized,

03:09 hey, I want to go back to school, and so I actually ended up in graduate school for physics eventually,

03:12 and so I have a pretty strong, you know, mathematics, physics background, but ultimately,

03:16 I did decide to sort of go back into computer science, software engineering. I really like working

03:20 on software, though, that's in the service of analytical endeavors or science and that sort of thing,

03:24 and so this is why, you know, being able to work at Anaconda on all these tools has been really

03:28 fantastic. Yeah, it's got to be super rewarding to have so much impact on the science side. Are you

03:34 still at Anaconda? What are you doing these days? Like, what's a, what kind of programming and work

03:38 do you do day to day now? Yeah, no, I actually just recently left earlier in the year, so I was at

03:42 Anaconda from the beginning. I think I was the last original employee to leave, in fact, except for

03:46 Peter Wang, of course, who's still there, but, you know, eight years is a long time, and so it's just time for

03:51 me to go look for something different, and I actually went to go work at Microsoft, and that was really on the

03:54 strength of some interactions I'd had with folks at DevDiv and Microsoft around Python, around open

03:59 source. Everyone there has been really terrific and really supportive of Python and open source, and so I

04:03 think it's a very different company than when I, you know, thought about it 15 years ago, where I

04:08 probably would have used M dollar sign very sincerely on an angry forum post or something, but, you know,

04:12 everyone there has been really terrific. It's been a good experience, and day-to-day, I work on Azure SDK

04:16 for Python these days, which is, you know, a lot of PR reviewing, writing some code, and

04:21 helping move the direction there.

04:23 Oh, that's really interesting. You probably feel like you're bringing a little bit of the outside

04:27 to Microsoft, right? Like, it is a very different company. They're more open to external stuff, but,

04:33 you know, historically, it hasn't always been that way, so it's probably like, let me tell you about the

04:37 Python scientific stack, folks, things like that, yeah?

04:40 It's definitely interesting, and there's a lot of give and take. So I've actually learned,

04:43 I haven't been in an organization this large in a very long time, and so it's been a lot of personal growth

04:47 and learning for me, just to be in that kind of environment where, you know, people have to interact

04:51 in different ways, and that's been very gratifying and helpful for me, but definitely, I think I have a

04:55 pretty useful perspective to bring as well, especially in terms of, yeah, data science applications

04:59 in Python and that sort of thing.

05:01 Yeah, yeah, super cool. It sounds like a fun job. So let's start off this conversation talking about

05:06 Bokeh by kind of getting the, like, a big picture of, you know, making pictures with Python, right?

05:13 So if I have a graph, I want to do a map, if I want to do some kind of bar chart or some

05:18 visualization of data, what are my options nowadays?

05:23 There are a lot. So if people want to Google, there's actually a chart made by Jake Vanderplass,

05:29 who, you know, is very active in sort of the PyData, SciPy community. He tried to draw a map,

05:34 basically, of all the Python visualization landscape, and there are a lot of tools available. And some

05:38 people think this is really great, and there's a lot of choice, and then some people think that

05:42 there's just, you know, too many things, and they don't know what to deal with. But there are a lot

05:45 of tools. So obviously, you know, MapPlotlib is the very big tool that's been around for a very long

05:49 time. It's a really fantastic tool, and all the devs there, you know, they work really hard. And

05:52 it's been great to see the sort of the strides that it's made in the last few years. In terms of,

05:57 like, web visualization, there's, you know, Bokeh, of course. Plotly is another offering that's out

06:02 there by the company Plotly. Altair is another tool that's been fairly recently added. Actually,

06:07 Jake Vanderplass and Brian Granger from Jupyter put that together. And so it's inspired by

06:12 the Vega plotting sort of toolkit that's available in browsers. And it's sort of a Python wrapper for

06:16 that.

06:17 Yeah, that's cool. I've heard a lot of good stuff about Altair, and that it's really quite nice as

06:21 well.

06:21 Yeah, I don't have a lot of experience with it. I mean, it's definitely intended for

06:24 very high level sort of exploratory data analysis. It's very, you know, useful, especially in notebooks

06:29 in particular. And so it looks very attractive from the things that I've seen. I just am so,

06:34 you know, I'm very involved in Bokeh. And it takes up so much of my time that I almost don't

06:38 have time to look at too many other things very often. But, you know, Jake's a fantastic guy.

06:41 Brian Granger, of course, is great and has made just amazing things for the PyData and SciPy

06:46 communities. So that's a great tool that they put together.

06:49 Cool. And, you know, there's always things like the JavaScript libraries, like D3 and stuff like

06:54 that. Is that really relevant? Or are we kind of got a handle with things like Bokeh and Plotly and so on?

06:58 You know, so people ask that a lot. Like, what's the difference between Python and D3? Or why would you use

07:02 one versus the other? And I think if you have people that are, you know, already using JavaScript

07:07 and they want to work on things with D3, D3 is an amazing tool and it can make incredible, you know,

07:12 output and really fantastic graphics. And there's probably things that are doable in D3 that maybe

07:17 would be more difficult in, you know, Bokeh, for instance. But where I think the sweet spot for Bokeh

07:21 is we've really tried to make it so that people who are already very productive in Python, they're doing,

07:25 you know, work in data science or science, who are using all these tools that are in the PyData stack,

07:30 you know, NumPy and SciPy and Pandas and Scikit-learn and, you know, Dask and Numba and all

07:34 these tools that are really productive with these in Python. We won't let them have access to very

07:38 interactive, powerful visualizations in the browser without having to reach for that JavaScript

07:42 and web tech and sort of be distracted from the actual work that they want to do. And so

07:46 in terms of productivity, I think if you're already working in Python, I think Bokeh is a great

07:50 choice, to put it that way.

07:51 Yeah. Well, Bokeh to me feels like I get a lot of the benefits of the rich JavaScript stuff,

07:56 but that I don't actually have to make it.

07:59 That's a very, yeah.

08:00 Yeah. A very succinct way to put it. Yeah.

08:02 Okay. Interesting. And then maybe we could talk really quickly about Plotly just as a compare and

08:06 contrast. So Plotly, like I don't fully understand Plotly. When I go to work with it, I feel like,

08:12 okay, I'm working with a library, but then it seems like it has like a backend that they provide that

08:17 I have to deal with. And then there's also a commercial version. Like what is Plotly? I don't really

08:21 know where it fits. So in terms of their business, I actually don't know a lot. To be honest, I don't,

08:26 I don't really follow that very closely. And so I think they've actually changed some of their

08:29 offerings from, from what they used to be. I think they used to, you know, sell Plotly and I think

08:32 they're not in that business anymore, but I can't really speak to that very carefully. But the main

08:36 similarities are it's a Python API that generates a declarative, you know, specification,

08:41 typically some kind of JSON that can be rendered by a front end library. Now, you know, Plotly is an

08:46 entire company centered around this. And so they have had some really nice resources for, I think,

08:50 developing both, you know, some things in ways that Bokeh hasn't had. Like I think, you know,

08:53 their front end is a little more polished and some of the, you know, design stuff is definitely

08:56 polished. And I'd love to get some help on the Bokeh side to sort of bring that up to speed.

09:00 But, you know, relatively speaking, I think we've done a pretty good job at, you know,

09:03 having the same set of features. They're very contemporaneous. They started at almost the same time,

09:07 you know, way back in sort of 2012 kind of era. They have a lot of similarities,

09:11 but definitely there's a little bit of difference. You know, my background comes from a lot of science

09:15 stuff. So I'm really familiar with folks that have use cases, for instance, around like dense

09:18 arrays, like, you know, big images. And so we've really focused on some things like having an

09:22 efficient array protocol for the Bokeh server that can transmit large arrays, you know, very efficiently.

09:27 And whereas I don't think they've sort of gone down that route, you know, again,

09:29 they've worked on some other features that are more around slick dashboarding and, you know,

09:34 that sort of thing.

09:35 Okay. Yeah. Interesting. You know, let's talk about the history of Bokeh. Like you're saying it that way,

09:40 and I guess I'm as well, I'm trying to anyway, the way the proper pronunciation is.

09:45 I usually say Bokeh, but Bokeh, I think is also fine. Amusingly, long ago when we were funded through

09:50 this DARPA X-Data initiative, there was some video that someone made, you know, unrelated to all the

09:55 actual projects, but they made about the projects. And I remember in that they were describing all the

09:59 projects that was under the X-Data initiative. And they mentioned the visual database Boke,

10:04 I think. That's the only wrong pronunciation.

10:05 Bokeh is not Bokeh. All right.

10:07 Okay. It's fine. Yeah.

10:08 All right. Excellent. And so where did it get started? It started out of research grants and

10:14 this DARPA funding. Is that where it came from?

10:16 No, the research grants really helped, but it started before then. So going back a little bit

10:20 further. So I've been interested in visualization for a long time. Actually, the first, one of the

10:24 first things I use in Python and Python 1.4 was a plotting plugin for Apache. And it was just,

10:28 it amazed me that you could like take data and have a website just make a plot. It was incredible.

10:33 I worked a little bit on VTK here and there years ago. A few, my first open source contributions

10:37 were to VTK. But in the middle aughts, I guess, I went to work for a company with Peter Wang,

10:43 who eventually founded Anaconda. And I worked with him on a library called Chaco, which was a rich client

10:49 library for interactive visualization. So instead of in the browser, you would write like a Python

10:52 application that was using like, you know, QT or, you know, the GTK, you know, kind of real

10:58 application. And it was also contemporaneous with Plotlib.

11:03 But ultimately, Matt Plotlib won sort of that battle. And the reason was pretty clear. It was

11:07 because, you know, while Chaco had all this really rich capability for an activity, it was a very sort

11:11 of fiddly API, very detailed, kind of verbose. And so later on, when Peter was getting Anaconda

11:19 started, and I was on board to help with that, we had talked about wanting to update this idea of

11:23 Chaco and create a new library that supported interactive visualizations and in browsers,

11:28 because right, browsers is the right place for it to be in 2012, right?

11:30 Of course.

11:31 And so we had this idea, we started it. But getting the, you know, the funding through the

11:35 DARPA X8 initiative in the early years of Anaconda, is then called Continuum Analytics,

11:39 really helped, you know, mitigate business risk for us to put more resources into it. So it really

11:42 accelerated the development, I would say. But we have certainly, you know, we started talking about

11:46 the ideas behind Bokeh in sort of middle 2011, probably.

11:49 Okay. Yeah, that's, that's pretty interesting. It's been around for a while. I guess that's,

11:53 at that point, Ajax and interactive browser stuff's pretty well established, right? So it's pretty

11:58 clear that was the right place.

11:59 Yeah, I mean, we just knew that, like, the future of presentation and the future of getting, you know,

12:03 this content in front of people was going to be in browsers. So writing another rich client library

12:06 was not something that was really interesting to us. And so we definitely wanted to do the browser.

12:11 And then we definitely wanted to make it architected in a way that it was very flexible,

12:15 that it had this declarative specification that described what you wanted to visualize.

12:19 Because, you know, that affords a lot of possibilities. So we've talked about the Python

12:21 side of Bokeh, but you can actually have other languages drive Bokeh plots in the browser,

12:25 you know, there's an R Bokeh binding, there is a Scala Bokeh binding that hasn't been updated in a while.

12:30 I'm interested in actually reviving a Julia Bokeh binding. But that's all because there's this

12:35 JSON specification. And any language that can, you know, dump out the right JSON can create these

12:39 Bokeh plots in the browser. Now, we've spent a lot of effort investing in the Python bindings,

12:42 because, you know, Anaconda is a big Python shop. But certainly the possibility is there for other

12:46 languages as well.

12:47 Okay, so maybe it's worth just touching on the architecture a little bit, and we can dive

12:51 into the details more later. So there's a, I guess there's a couple ways we can use this,

12:55 right? Like, there's a, probably the most straightforward way is we have a Bokeh server.

13:00 And then there's some front end stuff that is the rendering point, right? Like I want to put a,

13:06 some kind of graph in a browser, and the server handles all the data, and maybe it only prevents,

13:12 presents like a slice into that world of the data and things like that, right? Can you tell us about

13:18 that?

13:18 That's absolutely a great use case for the server. But I will say the server is,

13:21 in fact, optional. I would say most usage of Bokeh probably doesn't involve the server. So Bokeh can

13:25 generate this JSON, and it can send it to a web page that can be embedded in like a Flask app or a Django

13:30 app. It can be embedded in, you know, Jupyter notebook output cells. And it doesn't have to be

13:34 connected to the Bokeh server. Bokeh.js will take that JSON and render it. And you've got an interactive

13:39 plot that has panning and zooming. You can even have linked behaviors between plots. You can have custom JS

13:43 callbacks that, you know, do work whenever you make a selection or click a button. None of that

13:47 requires the Bokeh server. What the Bokeh server is really great for is when you want to connect all

13:51 those interactive features to real running Python code. Like you want to click a button and have a,

13:55 you know, a scikit-learn regression or a scikit-learn model run, or you want to, you know,

13:59 make a selection on a plot and compute a linear regression line through those selected points

14:03 with real Python code. That's what the Bokeh server is really great for, making that sort of two-way

14:07 connection between this front end and real running Python code. But you can use Bokeh very effectively

14:12 without the Bokeh server. And in fact, I guess, I think most usage is probably, we call it standalone

14:16 usage, where it's just generating this pile of JSON that is used to drive a Bokeh plot in a webpage

14:21 somewhere. This portion of Talk Python to Me is brought to you by Ting. Let me tell you about Ting,

14:27 a new mobile service available in the US that's targeted developers and other technically savvy folks.

14:33 First of all, their average customer only pays $23 a month, but they're no discount provider.

14:38 Their service runs over T-Mobile's and Sprint's fast nationwide network.

14:42 If you don't use that much data because you're usually on Wi-Fi, like many of you are,

14:45 then Ting will save you a ton of cash. But don't worry, you can still use as much data as you like

14:50 for just $10 per gig. One mobile feature I use all the time is tethering. And with Ting,

14:55 you get unlimited tethering at the same data rate with your account. $6 a month for a phone line,

15:00 $10 a gig, $3 a month for text if you usually chat over iMessage or WhatsApp. Think about it,

15:06 no contracts and super clear and fair billing. Visit python.ting.com. That's python.ting.com and check

15:14 out their savings calculator. Enter your usage and see exactly what you'd pay. Use that link and you'll

15:19 get a $25 credit to try them as well. That's python.ting.com or just click the link in the show notes.

15:26 The server is mostly about the interactive bits if you want to add smarts to your plots.

15:32 Yeah, absolutely. If you want to connect all those PyData tools, NumPy, SciPy, Pandas, Dask,

15:37 Numba, OpenCV, any of those tools. If you want to connect those things directly to

15:42 these interactive visualizations with a minimal amount of fuss, that is what the Bokeh server is for.

15:47 Exactly.

15:47 Okay, cool. Before we move off the history and Anaconda Inc. and all that, when you created it,

15:52 it sounds like you tried to create it as a standalone project with its own fundraising and its own

15:58 outreach. What was the thinking there rather than just making it part of Anaconda?

16:02 Well, I mean, in the very early days, it was definitely a project that was started at Anaconda.

16:06 And the DARPA thing came along somewhat serendipitously. Not something we counted on or

16:11 knew about when the company started. And that was a big sort of funding for a long time. After that ran

16:15 out, because that was a fixed number of years sort of support, Anaconda very generously supported the

16:20 development of Bokeh. But ultimately, it was always the goal to try to create these tools as sort of

16:25 self-sufficient, self-governing, push them out in the open kind of projects. And so it took a long time

16:30 to get to that point. The first step in that was for Bokeh to become a NumFocus fiscally sponsored

16:34 project. But of late, we've really ramped up the self-governance and the self-sufficiency. So pretty

16:39 much at this point, I think the cord's been cut and Bokeh is really out there. It's managing its own CDN

16:43 resources. We're doing a lot of outreach and fundraising on our own right now that wasn't happening even six months ago. We just had or still having actually

16:50 a July fundraiser going on to try to help pay for some of our infrastructure costs. But we're also

16:55 ramping up some corporate engagements and trying to talk to corporations and see if they want to offer

16:58 support. But that all is pretty new. For a long time, Anaconda was the primary beneficiary or

17:03 benefactor, I should say, of Bokeh.

17:04 Sure. And it just makes Python and data science stronger, which is really the heart of Anaconda Inc.

17:11 anyway. So it seems pretty reasonable. You said that the project was a NumFocus

17:15 project. And we spoke a little bit earlier about NumFocus a bit. And I guess my understanding is a

17:22 little bit off. I saw NumFocus as a thing that kind of provides funding to these projects. And

17:30 that's not quite exactly right, is it? What is NumFocus and what did it do for you all?

17:33 Yeah, it's sort of yes and no. So NumFocus was started by Travis Alphont, who was one of the other

17:38 co-founders of Anaconda. Of course, he's the original or he's the author of NumPy, building on previous work.

17:43 But NumFocus is this nonprofit, 501c3. And its main role is to be an umbrella organization for

17:51 open source projects, right? So open source projects often are not legal entities. And so that actually

17:55 makes it quite difficult for them to accept money, right? It sort of gets very complicated with taxes.

17:58 And so they're this legal entity that can accept donations on behalf of projects and then handle all

18:04 the tax stuff. They also do a lot of outreach. They hold, you know, they support the PyData meetups and

18:10 PyData conferences around the world. So they're this organization that sort of helps and supports

18:13 open source. And they do fund these projects in the sense that, you know, they help the donations that

18:17 come in get back to the projects. And also sometimes they spread some of those funds around. They get

18:21 bigger donations that they can also use to give to projects that, you know, don't necessarily raise

18:24 their own money. But yeah, that's their main role. Just to be this umbrella organization, they help out

18:28 with the bureaucracy and take away that load from these open source projects.

18:32 Yeah, that's really interesting. I guess it took me a long time to realize that it's

18:35 actually hard for companies to give money.

18:39 It's really hard.

18:40 To these projects, right? Like it's not just a matter of put aside the point whether they should

18:44 put aside the point whether they fiscally could. Of course, they most of them can and they should,

18:49 right? But just the way that they're set up is they buy things. They exchange money for a service or a

18:58 good. Even given to a charity like NumFocus probably is a little bit odd and hard for them to do.

19:03 That is exactly right. There is just an impedance mismatch. I mean, the sums of money that would help

19:07 a lot of open source projects, I think, are relatively small compared to the budgets of the

19:11 companies we're talking about that are using these projects. But yeah, there's just an impedance

19:16 mismatch for how do you actually it's not a purchase, right? And so it's not hiring someone. So what

19:21 exactly is it? And companies just, you know, right now don't know how to handle that and don't know how to

19:25 deal with it. And so there's some other efforts to make that easier. People are trying different

19:28 things. There's, you know, Travis's company, Quonsight, is trying out some models for, you know,

19:32 getting support for open source projects by engaging with companies in various ways. Tidelift as well

19:37 is also trying that. GitHub is, you know, trying their new sponsorship sorts of things. So people are

19:41 trying to find novel ways to attack this problem. And I hope we get there, but it's definitely

19:44 something that will take some time.

19:46 Yeah, I hope so as well, because it would make a huge difference and it would be

19:49 a blip for these companies to make that contribution, right?

19:52 Yeah. I mean, as an example, our goal for the July fundraiser was to raise a thousand dollars,

19:56 right? And we have, you know...

19:57 That's very modest, right?

19:58 I think several hundred thousand users and we did, you know, I'm really happy to report that we did,

20:02 but, you know, I had to tweet out a lot every day to make it happen. It was, you know, sort of

20:06 knocking on doors and, but yeah, I'd love to be able to go to companies and try to be more effective

20:10 and efficient at fundraising, if that makes sense.

20:12 Sure. Well, while we're on the subject, let's talk about your fundraiser real quick.

20:15 If you're going to raise a thousand dollars, that's probably not money to pay developers,

20:19 right? It's something else.

20:21 Yeah. I mean, I actually raised this exact issue around the sustainability conference that NumFocus

20:26 is going to have later in the year. I think there's sort of two different buckets where things go into

20:29 there's, you know, paying people. And I hope someday we can figure that out really well. And we can

20:33 support, you know, people to be maintainers of open source software by actually paying them to do

20:37 that work directly as a sort of a job or a living. But there's also just the matter of a lot of open

20:43 source projects, I think, could use a fairly small amount of money just to cover expenses or help

20:48 take some strain and stress off the maintainers. Like in the case of Bokeh, we have to run this,

20:52 you know, CDN to deploy Bokeh.js. So all the users around the world can get Bokeh.js to display their

20:57 plots. That is run on, you know, AWS CloudFront. And so we have to pay for that. Someone has to pay for

21:02 that. Right. And so that's what this fundraiser was for. And so in the sense that it, you know,

21:06 sort of reduces my stress because it helps me know that this is sort of taken care of for the next year.

21:10 That's what that level of sort of funding is for. And there's other stuff too. There's,

21:13 you know, it's good to get developers face to face sometimes. And so this could help with that

21:16 as well as other infrastructure costs.

21:18 Right. Some having some like yearly meet up with the core developers. It's odd. There's a lot of

21:23 projects where the core developers have never actually met.

21:25 Definitely. For sure. I think I've met everyone at least once, but there's some folks that are pretty

21:30 scattered for Bokeh. But for sure, I have no doubt that's the case.

21:33 Yeah, for sure. So you said that Bokeh works with this JSON output and it can just basically render

21:39 anything that can generate the JSON. It can go out and render that. You know, I think when I think of

21:44 graphs, I mostly think of notebooks and Jupyter and things like that. But also we can just plug this

21:50 into whatever. Is that right? I can plug it into just a Flask site. Can I plug it into even like a,

21:55 some kind of command line app and like somehow pop it up?

21:58 Yeah, I don't see why not. I mean, anything that can run JavaScript basically, right? Anything that can

22:01 load Bokeh.js. So if that's like an Electron app, I think that would be feasible, certainly in the,

22:06 you know, in the Jupyter notebook. But, you know, there's a variety of ways. We have a whole Bokeh.embed,

22:10 you know, API. And so there's a variety of different ways to embed Bokeh content. But if you're running a

22:13 Flask site, you can use one of those ways to pop up Bokeh content in the middle of your site and in

22:18 your template and sort of wherever you want to put it.

22:20 When I think about the data science space, there's some libraries and data structures that are just used

22:27 over and over and over. You mentioned Travis before. So there's NumPy. There's,

22:31 of course, Pandas. There's a bunch of other stuff built on top of that. Is there special

22:36 integration for those types of libraries? Like if I already have some NumPy array or I've got some

22:42 Panda data frame, is there a way to just like plug it into Bokeh?

22:45 Yeah. So Bokeh works really well with all those things. NumPy is an actual requirement,

22:49 runtime requirement for Bokeh. We tried to avoid that. Even in the beginning, we wanted to make Bokeh as

22:53 minimal and as accessible as possible. But NumPy is a requirement now. Pandas is not a hard

22:57 requirement, but Bokeh works really well with Pandas. If you have Pandas data frames,

23:00 you can basically plug them in anywhere a Bokeh column data source would go. I can automatically

23:05 sort of convert them or use them in a way that's useful, either data frames or group by objects.

23:09 And so we've tried to make it very easy to integrate with Pandas, but also make Pandas

23:13 not required. And so that's sort of the state where things are at. Anything that can sort of behave

23:17 like a list or an array or a Pandas series works pretty much out of the box with Bokeh.

23:21 Okay. Are there other major libraries that I don't know to ask about?

23:25 I think Bokeh works really well with Dask, I assume, because Dask has a data frame-like API.

23:29 Matt Rockland actually used Bokeh to develop the interactive dashboard that's sort of the cluster

23:34 monitor for Dask.

23:35 I think it's great as well. And I did have Matthew Rockland on before to talk about Dask,

23:39 but I don't know if everyone's listened to that one. And also if they've seen the actual

23:45 visualization of that. So could you maybe describe that real quickly, what Dask is and then that

23:49 dashboard? Because when I saw that, that just like blew me away.

23:53 Yeah. Dask is a tool for basically parallel distributed programming. And it's trying to

23:57 do so in a way that is a very sort of Pythonic, very Pandas-like API, right? So there's other

24:02 tools that do this sort of thing, but they come from other languages originally. And so their APIs are

24:06 maybe not very Pythonic and they kind of don't fit in well with Python tools. But Dask is meant to be

24:11 a very Pythonic tool for this distributed computing task. And so to that end, it has this dashboard

24:16 that Matt Rockland developed using Bokeh that can visualize everything that's going on around a

24:20 cluster that's doing computation all at once, right? So it can show you what

24:23 nodes are computing or waiting or they're transferring data in real time. And so that's

24:27 really helpful for diagnosing problems with parallel distributed computations. And so Matt has always

24:32 been very clear that Bokeh was great for him because he didn't have to write all this JavaScript to have

24:36 this really interactive dashboard. He could just write it in Python and connect it directly to his

24:40 telemetry that he was getting back from clusters and visualize it very quickly. And so I think it's

24:43 been a great tool for his users. I shouldn't say his users. It's a big project now. Dask has actually

24:47 grown up quite a bit. So I should say the Dask project's users. But we're happy to see that kind of thing happen.

24:52 You know, love to see Bokeh used in those kind of use cases.

24:54 Yeah. And it just looks so good and professional and, you know, live updating. It's really a nice

25:01 use case for Bokeh. And I think it's also a good testament to what you guys have built.

25:05 Well, it's also good to get feedback from real use cases like that. Nothing sharpens your tools

25:09 better than sort of having them honed against real problems, right? And so we love when people do

25:13 awesome things with Bokeh and tell us, you know, hey, this was great, but also this could be a little

25:16 bit better or easier. You know, this is how you could make my life, you know, simpler or here's some pain

25:20 points I had. That kind of feedback is really helpful from users.

25:23 Yeah. It's great to design something, but once it actually meets real users and real use cases,

25:29 like that's where it gets real. So you talked about Dask. Let's just touch on some of the other

25:35 things that is built upon Bokeh because Bokeh has been around since 2011, 2012, like you said,

25:41 and it's pretty stable for the most part.

25:43 It's a lot more stable recently. Yeah.

25:45 Yeah, that's great. So things are starting to build on top of it like Dask and just using it.

25:49 So what else is out there like that?

25:50 Well, there's a couple of different things in different classes of things, right? So first of

25:53 all, there's other libraries now that are starting to build on top of Bokeh. So there is Chartify,

25:57 which was created by the data labs at Spotify. And so it's their sort of high, very high level

26:02 sort of data science, opinionated data science API on top of Bokeh. There's a project called Pandas

26:06 Bokeh that just came out recently. That's sort of very tight integration with Pandas and using Bokeh to

26:10 generate interactive plots. There's also a set of tools created by some folks who are still at Anaconda

26:14 called sort of the efforts called PyViz. And so there's a tool called HoloViews, which is a very

26:18 data centric API, and it can generate interactive visualizations using Bokeh and other tools as well.

26:24 But there's also, you know, some tools like Data Shader, which are for, you know, very large data,

26:30 you can finally control how they're rendered. And so you can combine Data Shader with Bokeh, you know,

26:34 using HoloViews. So it can drive things at a very high level. So I'd love to see this effort where

26:37 people are building these things on top of Bokeh. And I'm also glad that now, you know, Bokeh was

26:41 a moving target for quite a while. And I very much appreciate the patience of all of our users who

26:45 sort of, you know, kept with us as we were figuring things out. But I would like, you know, we're trying

26:49 to be much more stable now. I think we've done a very good job since 1.0 was released at being,

26:53 you know, much more stable. That's very good. The other kind of things that get built on top of

26:58 Bokeh are more like applications or, you know, other projects. So there's a project called Microscopium

27:04 that is for, you know, sort of biosciences research. There's a tool called Light Curve that a bunch of

27:09 astronomers put together, which uses Bokeh to let you drill down. So you can see like an image of,

27:13 you know, some star or something and hover over a single pixel and really drill down into something

27:17 about, you know, that image using, you know, all the tools that are in Bokeh. You know, there's

27:21 actually quite a, Dask, of course, is a great example. And then there's actually a bunch of other

27:25 ones on GitHub. And I'm sorry, I don't have a list at the top of my head, but there are a lot of

27:29 exo-bioscience projects that are built on top of Bokeh. And, you know, financial trading is another

27:34 thing that comes up. People have done drug discovery type work on Bokeh. It's interesting where things are

27:39 popping up, especially, you know, now, and now things are more stable. I think it's really a great time for

27:43 that to happen.

27:44 Yeah, that's super cool. The Light Curve project looks amazing. I mean, to explore like data from the

27:50 the Kepler and TESS telescopes, that's pretty cool for exoplanet discovery. I mean, that's,

27:57 it's really exciting.

27:58 Like the project that you worked on is helping scientists like actually look for exoplanets.

28:03 Like, that's incredible.

28:04 That is a, no, it's really gratifying. Like, that's exactly the sort of thing that, you know,

28:07 I'd say we wanted to be able to enable and, and, and help, help make happen. So it's really

28:11 gratifying when people are able to use Bokeh for those kinds of situations.

28:14 Yeah, absolutely. Do you see Bokeh being useful or appropriate for like real-time dashboard type

28:23 of scenarios? I mean, obviously it draws great graphs. And then, you know, we talked about Dask

28:28 and the real view of that. So like, let's imagine I'm like building software for software, for

28:33 stock trading company or something like this. And I want to show in real time what the market's doing.

28:39 All the data the traders need. Would that be appropriate? Is it too low late or too high

28:43 latency or what's the story? I think it could be there's real time. And then there's quote,

28:46 unquote, real time. Right. So it's always depends on what exactly people mean when they say real time.

28:50 When I hear the word real time, I have images of like, you know, very low level, a certain kind

28:54 of system that's implemented that has very specific guarantees about its performance. And for that kind

29:00 of work, Bokeh is probably not suitable. Right. But if you're talking about real time, just in terms

29:03 of like streaming data coming in from a financial system or, you know, from, IOT type devices

29:09 or, you know, that sort of thing. And I think Bokeh can, has been useful. There's certainly

29:13 people who have come looking for support on our discourse or on our other support forums,

29:16 talking about using Bokeh, connecting it to real time sensors, for instance. Right. So they're in a

29:21 factory or a warehouse and they've got data coming in. I want to visualize something. So people do do

29:24 that. So if you mean, you know, quote unquote, real time, and just in the sense that I've got data

29:28 coming in and I want to visualize it in a best ever kind of way, then I think Bokeh is definitely a

29:32 good choice.

29:33 Yeah. And that's mostly what I meant. I mean, obviously not like we need seven millisecond response time or else

29:37 the plane crashes, something like that. Nothing like that. Right. But like, like when you're

29:42 thinking of graphs, right, like a human has to see the graph and interpret the graph. Right. So that's,

29:46 you know, how long is that? It can't be quicker than a hundred milliseconds. Right. Like the human

29:51 can't understand graphs that quick.

29:52 Well, and so real time is not even necessarily about quickness. It's really about predictability. It's

29:56 about specific kinds of guarantees. But, but yeah, so I should mention, yeah, Bokeh does have some APIs

30:00 for streaming specifically. Right. So, you know, if you've got data coming in or you want to update,

30:04 you know, just the newest point, you've got a, you know, a time series with a hundred thousand points,

30:08 you know, plotted and you've got new points coming in at the end. Bokeh can very efficiently just send

30:13 the new data, right. Without sending, you know, the entire data set. So it is very useful for that sort

30:16 of thing. Yeah. So then you load up the historical data and then you just take, you know, the update

30:21 every half a second or something. That sounds pretty doable. Yeah, absolutely. Okay. Even more often.

30:25 Yeah, yeah, yeah. Sure. That sounds, that sounds super interesting. One of the capabilities kind of

30:30 around that, that you talked about is being able to work with like quite large data, maybe having some

30:36 on the server or something like that. And then either interpreting that or running a machine learning

30:41 model against that or something like that. Maybe tell us some of the use cases there.

30:45 Yeah. There's a couple of different ways you could use Bokeh for that. So one is that, you know,

30:49 if you have a large data set, you know, you're not going to send a billion points into your browser,

30:51 right? Your browser will just fall over. So you've got to find some ways to sort of minimize the data.

30:55 And that can be done in a variety of ways. And one of the ways is for instance,

30:57 it's downsampling. So if you have a large data set and you've got some reasonable way that makes

31:01 sense for your use case to downsample it, the Bokeh server can just do that downsampling and then show

31:05 you the subset of data that's relevant, right? So that's one way that you can use the Bokeh server

31:09 to handle sort of large data sets. Another way is I mentioned this tool data shader, which is

31:13 actually specifically designed for being able to, you know, very efficacious visualizations, images of,

31:19 you know, hundreds of millions or billions of points, right? It gives you very fine control over the way,

31:22 basically the more sophisticated version of alpha compositing happens. So you can actually try to

31:25 emphasize the things you want to emphasize in a meaningful way. And so you could use data shader

31:30 to data shade those 100 million points. And then that just produces an image and then you can send

31:35 the image to Bokeh. And so that's a very fast operation. So the data shader is sort of a form of,

31:39 you know, data compression in that sense, but it's still very interactive because, you know,

31:44 you can use the events that Bokeh generates to when you, if you resize the plot to get a new image

31:48 generated, if you've, for instance, changed the balance of the plot, if you pan or zoom,

31:52 you can get a new data shader image based on those new, on those new dimensions. And in fact,

31:55 HoloView sort of does that all automatically for you. You can do it by hand with Bokeh and data shader,

32:00 or, you know, at a high level, HoloViews can sort of take care of that for you. That's another way in

32:04 which, you know, you can sort of reduce the amount of data that you're going to send into the browser.

32:07 Coming up next year, I hope we can actually raise though the ceiling of the number of points you can

32:11 send. We're going to try to do some work, hopefully, to better improve the WebGL support in Bokeh,

32:16 and maybe even just have Bokeh be based entirely on WebGL. In which case, I think we could,

32:20 you know, right now, Bokeh, you could send a few hundred thousand points to it,

32:23 and it's typically okay. But I think we could raise that sort of ceiling a little bit higher

32:27 once we are able to render completely in WebGL.

32:29 Yeah, that would be pretty amazing. Is it using like canvases or something now?

32:33 Yeah, exactly. So it uses the HTML canvas. There is currently some level of WebGL support.

32:37 And the person who maintained that and originally wrote that is just, he's moved on to other things.

32:40 And so that WebGL support is sort of, it needs a little work, a little love and care. And we'll

32:46 probably just go ahead and try to do things sort of from the beginning and re-found that in a cleaner,

32:51 better way going forward. But yeah, there's some WebGL support now, but most of the rendering happens

32:55 on HTML canvas.

32:56 Okay. So let's talk a little bit about some of the internal implementations of this. Like,

33:02 when most people interact with Bokeh, they're probably interacting with some Python API. And as far

33:10 as they're concerned, like that's the end of it, right? Like, I call these functions,

33:13 the plot comes up magic.

33:14 Yeah. And actually the API is actually quite light. So by and large, we've turned the problem

33:19 of creating an interactive data visualization web app into the Python problem of creating a bunch of

33:24 objects and setting their properties, right? So, you know, I mentioned this JSON representation and

33:28 that JSON representation actually mirrors on both sides, a set of objects and those objects being a

33:32 graph. So there's a set of objects like a plot, which has, you know, a bunch of renderers and has a

33:36 couple of axes and some ranges and some data sources. Maybe that's in a layout that also has some

33:40 buttons. So there's objects we call the models that represent all of those items. All of those get

33:45 turned into JSON. And then on the JavaScript side, there's a one-to-one correspondence basically of

33:49 objects that those get turned into JavaScript objects. The role of the Bokeh server is just to

33:53 keep those two sets of objects in sync bi-directional, right? But in terms of what you use from Python,

33:57 you create this plot, maybe use the figure function, which sort of puts a lot of these objects

34:01 together for you in a convenient, meaningful way. And then you can twiddle their properties. You can

34:06 change, you know, the start and end of a range, or you can add the data to the data source, or you can

34:10 change various properties of a circle glyph because you want to change how it appears. And so all of

34:15 this is, you know, just setting these properties on these Bokeh models. And that is the main, the main

34:20 thing that people do, I think. Apart from that, you might be writing callbacks. If you're using the

34:24 Bokeh server, you can write callbacks in Python, you know, for if a button gets clicked or a selection

34:28 is made, you can run Python code. But you can also create JavaScript callbacks for the standalone

34:32 case where, you know, I don't have a Bokeh server, but I still want something to happen when a button

34:35 gets clicked or, you know, the selection is made. You can write a little snippet of JavaScript and that

34:39 will, you know, do that amount of work. And typically those callbacks, the end result of that is, again,

34:43 setting some properties on these objects, right? They might update the data source, which causes the

34:47 plot to update, or they might change the range bounds, which causes the plot to zoom out, that sort of

34:51 thing. So there is definitely API. There are functions, you know, there are functions for embedding,

34:55 there are functions for showing things in notebooks, there are functions for creating plots to start with.

34:59 But most of the content of the Bokeh library is these objects, you know, we call models, and they all have

35:04 these typed properties that you can set values for. And that's the main interaction mode.

35:09 Yeah, so very declarative in that sense, right? You set the aspects or the features that you want,

35:15 and it just figures out how to make that interactive.

35:18 Yeah, exactly. Yeah.

35:19 Nice. So it sounds to me like, listening to you talk, there's a lot going on with JavaScript

35:24 here, even though the typical consumer user of it, the developer doesn't have to care or work with it.

35:31 What are you using there? Like, what was the history? Was that always just straight JavaScript?

35:35 Or what's what are you doing?

35:36 Yeah, it's actually never been straight JavaScript. But you're right, the bulk of the work of Bokeh is

35:40 actually in this library, Bokeh JS, right, which is JavaScript library.

35:42 How big is it?

35:43 So minified, I think the main core library is about 600k. It's a pretty hefty library.

35:48 That's a pretty hefty library.

35:50 It is, right. We're looking to make things, you know, as optimized as we can. We definitely could

35:53 use help from, you know, more experienced JavaScript developers. So when Bokeh started, I mean, it was

35:58 started by me and a few other folks who are working in none of us, I think, had a lot of front end

36:02 experience. I didn't have any JavaScript experience when this project started. And so we actually chose

36:05 CoffeeScript at the time. And so that, I think, was maybe a good choice for the time, because it

36:10 allowed us to iterate very quickly and sort of make, you know, mistakes more quickly, I guess.

36:13 You try out things, you know, it's sort of Python looking like, you know, it's one of these

36:17 transpiled languages that turns into JavaScript. But ultimately, once the project grew very large,

36:22 it wasn't really suitable for that. And so we actually did a large effort. Most of that work

36:26 was done. Heavy lifting was done by one of our core contributors, Mateus, to port Bokeh to TypeScript.

36:31 And that's been a huge win for the project. I mean, just in doing the port to TypeScript,

36:34 a lot of latent bugs and problems were uncovered. Certainly since it's been done, you know,

36:39 I've been prevented from checking in things that would have been an error, you know, by the TypeScript

36:42 compiler. So I'm a big fan of that and glad for that. There are certainly new contributors who find it a

36:47 little bit more difficult or daunting sometimes to work with TypeScript so that there is a barrier

36:50 to entry that's a little bit high for Bokeh. And that's actually just in general, been a problem

36:55 for us, I think, to attract sort of contributors on that side, right? Because Bokeh is targeted towards,

37:00 you know, Python developers with the promise that they really don't have to worry about JavaScript

37:03 if they don't want to. But all the work's actually in JavaScript. And so, you know,

37:07 we need JavaScript developers to come help make Bokeh better. And so for the most part,

37:11 and so that's been a challenge for us a little bit.

37:13 You have this bimodal distribution of skills and desires and stuff like the Python folks and the

37:19 JavaScript folks. And yeah, it's interesting.

37:21 So we're trying to make Bokeh itself like a, you know, there are people who use the Bokeh

37:24 JavaScript library just by itself as a JavaScript library. I would say that from my perspective,

37:28 quite a bit of work is needed to do to make that a serious sort of contender for something people

37:32 want to use. But we definitely would like to get that done. And we'd love to get help doing that.

37:35 I think making Bokeh JS as sort of a first class JavaScript library in its own right would be very,

37:39 helpful for our project. And certainly it'd be great to get a community around that as well.

37:43 But that's a longer term goal.

37:47 This portion of Talk Python to me is brought to you by Linode. Are you looking for hosting that's

37:52 fast, simple, and incredibly affordable? Well, look past that bookstore and check out Linode at

37:57 talkpython.fm/Linode. That's L-I-N-O-D-E. Plans start at just $5 a month for a dedicated server

38:04 with a gig of RAM. They have 10 data centers across the globe. So no matter where you are or where your

38:09 users are, there's a data center for you. Whether you want to run a Python web app, host a private Git

38:14 server, or just a file server, you'll get native SSDs on all the machines, a newly upgraded 200

38:20 gigabit network, 24-7 friendly support, even on holidays, and a seven-day money-back guarantee.

38:26 Need a little help with your infrastructure? They even offer professional services to help you with

38:30 architecture, migrations, and more. Do you want a dedicated server for free for the next four months?

38:35 Just visit talkpython.fm/Linode.

38:38 It's interesting that you found TypeScript to be a nice way of working and whatnot. And I find it'd be

38:46 pretty nice as well. Certainly, if I had to choose between CoffeeScript and TypeScript, I would

38:50 definitely choose TypeScript. You know, I think TypeScript is interesting in that it's a superset of

38:55 JavaScript. So all your regular JavaScript just works, but you can like typify it and make it have other

39:00 features and capabilities that that language brings. And that's a pretty interesting way to approach that

39:07 problem.

39:07 Oh, definitely. Yeah. And to be clear, I think CoffeeScript was the right choice in 2012. I don't,

39:11 it's not at all the right choice for anything, I don't think, in 2019. I think at the time,

39:15 Bokeh was one of the largest CoffeeScript libraries probably ever developed, which is interesting,

39:20 sort of a bit of trivia. But like I said, it let us move fast, especially not having a lot of

39:24 experience in front-end dev. But, you know, after time, we just needed something more,

39:29 a little more serious, for lack of a better word.

39:30 Yeah, sure. I'd just like to get your thoughts real quick. Like, so TypeScript is all about,

39:35 you know, sort of static typing and checking and whatnot of your code. And we kind of have that

39:42 in Python a little bit now, to the extent that people want to bring it in with mypy and type

39:47 annotations. But it's not really the main zen of the language of Python.

39:52 What are your thoughts of like working in these two languages, kind of side by side on the same

39:55 project?

39:55 Well, so this is a really interesting question. So there is actually a history of various projects

39:59 that add what's called, I think, manifest typing to Python. And so that goes back to,

40:03 there's definitely a project called Traits that Joseph Morrill created that was, you know,

40:07 sort of, you could add types to classes, and those would get checked at runtime. And you could also

40:11 do things like reactive programming and event-based programming based off changes to those values.

40:15 Traits auto-created like QT GUIs, I think, from classes as well, the panels.

40:20 And there's another one called Param. And I think there's now one called Struct. But Bokeh

40:24 also has its own property system. I mentioned these properties of models. Bokeh has its own

40:28 property system, which is rooted in a bunch of fun metaclass programming that lets you add these

40:33 declarative types. So the actual models I mentioned for Bokeh objects are typically have no code in them.

40:38 They're just classes with these property definitions that say, oh, you know, my plot width is an int,

40:44 or my source property is an instance of a column data source, or the range has two floating point

40:50 values start and end. And so we're able to provide runtime feedback. If people try to set, you know,

40:55 the range.start equals some string value, we say, hey, that's not an appropriate value. It needs to

40:59 be an integer. And we also know what properties are on objects. So a feature people have really

41:03 complimented it's about is people sort of fat finger a property name, we'll actually give a suggestion

41:07 and say the nearest property names are named this. And so we-

41:10 Oh, that's nice.

41:10 Kind of a type system. Yeah, a really nice feature. It's sort of one of those simple things

41:13 you don't think about until you see it. But we've had a type system in Bokeh since the beginning,

41:17 right? And so it's a little interesting now that mypy is becoming more popular. We are interested in

41:22 looking to use mypy basically after Bokeh 2.0 comes out and we drop Python 2 support. We're

41:28 interested in trying to integrate mypy, you know, wherever we can. I think it's a useful tool.

41:31 I hadn't used it much until recently, but I have seen it used to good effect. And so I'd like to try to

41:35 improve that. I don't know how much we'll be able to use mypy to replace our existing

41:39 sort of type property system because that would be a huge endeavor because our properties,

41:43 they aren't just the type checking. They also plug into our documentation system so we can auto

41:47 generate our reference documentation. Wow.

41:49 And of course, all the auto synchronization is based on this too, right? A lot of the machinery for the

41:53 automatic synchronization and serialization is based off these property definitions, right?

41:57 Right. Like to notify that something has changed to people who are interested and things like that.

42:02 Yeah. So replacing it with mypy is not something I'm sure we can do for the properties,

42:06 but there's plenty of other places in the library where mypy would be a great benefit

42:10 to help us sort of tighten things up. And so we're looking at that after Bokeh 2.0.

42:13 Okay. Yeah. Yeah. Very cool. Maybe we could do a quick tour of some of the interesting graphs

42:18 or visualizations that you find, you know, like kind of interesting and worth talking about,

42:23 like over at demo.bokeh.org or just bokeh.org and just click on the gallery and demos and stuff.

42:29 There's a bunch of cool ones that has the source code. There's some interactive bits and so on.

42:33 You want to tell us about something you think are worth checking out?

42:35 Yeah. So for sure. So first off, if you go to demo.bokeh.org, these are all specifically

42:39 Bokeh server applications. So these are all backed by running Python process. And when you click a

42:44 button or make a selection, that triggers real Python code. If you go to the gallery on the docs,

42:48 most of those are standalone. And so they don't, they aren't backed by a Bokeh server just to get

42:52 that distinction out of the way. But at demo.bokeh.org, there's a couple of interesting ones here.

42:55 The first one on the upper left is this movie data explorer. And this is actually a fairly direct

43:00 comparison, intentional on our part, to a tool called the Shiny Movie Explorer. So people have

43:05 asked for a long time, where, you know, where is Shiny for Python? So Shiny is this tool for creating

43:08 sort of interactive data visualization applications from the R language. People ask, where is Shiny for

43:13 Python? So we're trying to answer that question. And I think Bokeh is a pretty good, decent answer to

43:16 the question of where is Shiny for Python. But so we made that as a pretty direct comparison. So that's

43:20 one that's interesting. Right next to it, there's this selection histogram, which I think is pretty

43:23 cool. So it's got a couple of distributions of scatter points on a plot. And if you make a

43:27 selection, it shows the histograms on both axes. And if you make a selection across those points of,

43:31 you know, a subset of those points, it then highlights and shows you the histogram of just

43:35 the selected points. And then sort of in the opposite direction, the select the histogram of

43:39 the unselected points and sort of a shadow faded out version. Wow. Yeah, that one's really cool.

43:43 That's a cute one. We've been working on that one for quite a while. It's gone through several

43:46 iterations that actually helped us uncover some problems with the Bokeh server early on. It was just sort

43:51 of behaving in a weird way and stuttering and realized that events were sort of boomeranging,

43:54 sort of making a boomerang effect. And so we had to sort of fix that out. But that was a great example

43:59 to help us figure out some of those problems. And we have a lot more things under a lot more rigorous

44:02 tests now. So that's good. But yeah, I like that example a lot. Another one we have is this

44:07 reproduction of the gap, the gap minder demo. So, you know, Hans Rosling did this, you know,

44:12 famous TED talk where he showed all this data. And so we've reproduced that in Bokeh. We've also

44:16 embedded the YouTube video. We wanted to be able to show being able to use, you know,

44:19 a template to embed Bokeh content in a template with other content. So this also has this YouTube

44:23 video embedded.

44:23 Yeah, that talk by Hans Rosling, you have the video there. It's really worth watching. Like

44:30 that guy really makes statistics and just data like relevant for humanity in a great way.

44:37 Absolutely. No, I'd recommend anyone to go watch the video, regardless of where they look at the

44:42 Bokeh bar. It's a great video. And I think it's a really compelling one. Tells a great story. So

44:46 I'd recommend anyone to go check that out. For sure. Let's see, lower left, there's actually a

44:50 financial chart. So here you can have time series from two sort of financial, you know,

44:54 data sets and you can do sort of a cross correlation between them. And you can see the

44:58 pandas sort of statistical summary there. And you can, you know, use the dropdown to choose

45:02 different time series and then the table updates and the data, you know, the plots update. So that's

45:06 a nice one as well. And then on the bottom right, there's kind of interesting one that's got this 3D

45:10 plot. And this is maybe confusing for some people. Bokeh itself is not a 3D plotting library and has

45:14 no inherent 3D capability built in. But Bokeh is very extensible. At some point, we realized that,

45:19 you know, lots of users have use cases that are eminently reasonable and, you know, really cool

45:24 that we're just not ever going to have the capability or resources to sort of do in the library. I mean,

45:28 you have to sort of limit the scope of the core library at some point. Yeah. So we work to make

45:32 Bokeh extensible. And so you can create these custom extensions that behave just like built-in

45:36 Bokeh models. And they plug in just like, you know, the plot object or a widget object, right,

45:41 into Bokeh content. And so this is an example of that. And so this is a custom extension that wraps

45:46 a little 3D JS library. And you use the standard Bokeh data sources and you update them. And then

45:51 this little 3D plot updates because basically the custom extension just wires together the Bokeh data

45:55 source with whatever this other library expects. And so it's really neat example of that. And there's

46:00 other examples of extensions in the docs as well for different kinds of use cases. If you want to like,

46:04 if you have some really cool JavaScript widget, you want to connect to Bokeh content. If you actually,

46:07 if you have a cool JavaScript widget that you want to connect to, you know, all these PyData tools,

46:12 like you want to connect this JavaScript widget to, you know, scikit-learn or to Dask or to Numbo or,

46:16 you know, pandas, Bokeh is a great bridge for that, right? You can just write a custom extension that wraps

46:21 the JavaScript component and then it's automatically Bokeh server can connect it to all those tools.

46:25 Yeah. And get all the change notification and interactivity and everything. Yeah.

46:29 That's super cool. Okay. Let's see. What else do you want to talk about? You all have Bokeh 2.0 on the

46:35 roadmap. What's going on with that? Yeah, absolutely. I would say we were targeting August,

46:39 but I think maybe a little more realistic at this point is September. We're always a little optimistic

46:43 in our estimates for our schedule. Welcome to software development, right? That's how it goes.

46:48 We're all like that way. I don't even want to speculate the first time we promised Bokeh 1.0 and

46:52 sort of stability. That was probably a couple of years too early, but we're a little more on track

46:57 for Bokeh 2.0. But the main thing about Bokeh 2.0 is just that we are dropping Python 2 support and

47:02 also Python 3.4 support. So Python 3.5 will be the minimum version. As long as we are doing a major

47:06 version bump, we're also going to take the time to clean up a few other minor things. So there's a few

47:10 minor changes that are coming. Hopefully nothing that's too disruptive for anyone. We're going to

47:14 be sure to outline and document all those in a migration guide. But that's the main thing is the

47:18 Python 2 support. And it gives us a chance to do some things like move to native coroutines. So we use

47:23 tornado as the base for the Bokeh server. But if we move to Python 3.5 as a base, we can use native async and

47:29 await coroutines everywhere. I'm still with tornado, but it helps us clean up the code a lot. And just

47:33 in general, it'll help us clean up the code base and make it a lot more maintainable and sort of

47:36 shrink it. And it's always good to delete and shrink code for sure.

47:39 Yeah. If you maintain it by deleting it, like you're good. Yeah, that's a good way to do it.

47:44 Do you think that'll help attract more maintainers to say like, hey, you could work on this cool async

47:48 IO, async and await library rather than, you know, this thing called tornado and these

47:53 coroutines?

47:54 Well, so it's still going to use tornado. And tornado is a really great tool, but I think it may.

47:57 It broadens the thing a little bit to hopefully some more developers. And there are, I stress that

48:02 there's a lot of work in Bokeh TS, but there's plenty of work on the Python side to do as well. And we'd

48:06 love to have contributors. And honestly, there's actually a lot of work that's not coding. I'd love

48:10 to get other contributors involved in all kinds of ways. And if I can speak a minute about that, I mean,

48:14 yeah, go for it.

48:14 Obviously, people talk about, hey, we need testing help and docs and design help and that's,

48:18 or docs help. And that's certainly true for us as well. But other maybe ways people don't think

48:22 about it is, you know, we'd love to get like designers, front end designers to come help

48:25 make our assets better, to come, you know, help us improve the visual appearance of Bokeh. Cause

48:29 you know, we've done okay, but we're not designers. And so it'd be great to get that kind of help.

48:34 We actually have a lot of infrastructure now on places like DigitalOcean and AWS, and it would be

48:39 great to get experienced people that know those, those systems and those DevOps on those systems to

48:44 come help us optimize them for cost, optimize them for usage, you know, whatever.

48:47 Yeah. You guys are doing cool stuff with Docker, right?

48:50 Yeah, we do a couple of things with Docker. So we run the demo sites, actually a Docker image that's

48:53 run on Elastic Beanstalk. And I actually just recently changed some of the instances that that

48:57 was running on to hopefully make them a little bit more cost effective for us. But we also just

49:01 recently had a spike in S3 usage on one of our buckets that I couldn't really explain just yet.

49:06 And so I'd love to get experienced people that can, you know, help with those sorts of things.

49:09 Outreach is another area. We're really trying to ramp up our outreach, both to the community in terms of,

49:13 you know, fundraising, but also talking to companies. And we've had a couple of people help with that.

49:17 And actually just offering support, right? We just moved our mailing list to a discourse instance,

49:22 discourse.bokeh.org, which is infinitely better. I mean, the discourse is great for users because

49:27 there's a lot of features for code highlighting, for math texts, just, you know, all kinds of things

49:31 we could imagine maybe putting an extension to put in actual bokeh content into these discourse posts.

49:35 But it's also great for us as maintainers because discourse has a lot of information about what are

49:39 people searching for, you know, what topics are popular, that sort of thing. So that helps us know maybe

49:43 where attention needs to go. But just answering questions there, people want to go offer support

49:47 and help other people use bokeh. That is also a huge deal. I think bokeh has been successful because

49:52 we've had a few people that have been able to put a lot of time into helping, you know, the community.

49:55 But as the community grows, that's got to scale. It needs to have more and more people helping each

49:59 other. And so that kind of thing would also be a great way to contribute to the project. And so

50:03 there's all kinds of ways people can plug in. And we'd love to, you know, engage with anyone,

50:07 really, about any of those tasks.

50:08 Right, right. If you're a designer and you want to make the website look shiny,

50:12 that'd be great. If you want to make the graphs look better, or maybe you're a visualization

50:17 expert and you've got a different kind of graph you want to bring, whatever, right?

50:20 Yeah, absolutely. Or just even making new examples for the docs, you know, making really cool uses of

50:25 bokeh to show off, to tweet about, to put in our docs and our gallery. I mean, there's all kinds of

50:29 ways to make very valuable contributions to the project just because there's a lot of things to do. And

50:33 you know, presently not enough people do them, probably never enough people do them,

50:36 right. But obviously, the more help we can get, the better.

50:39 Sure. So if I could summarize, you're willing to accept contributors to the project.

50:43 Yeah, absolutely. If I hadn't made that clear, yes.

50:46 That's awesome. Yeah, it's a cool project. It would be fun to work on.

50:50 Yeah.

50:50 As part of this bokeh 2.0 thing and the dropping of Python 2, which I like to refer to as legacy Python,

50:56 and Python 3 just as straight Python. But as part of dropping legacy Python, one of the things you did,

51:03 this is kind of a trend in the data science space, not, I haven't seen it as broadly adopted,

51:06 and I'm not really sure why, you signed the Python 3 statement. You want to tell folks about that?

51:12 Yeah. So the Python 3 statement is just, you know, it's a GitHub repository where projects can go and

51:16 sort of make a PR to list themselves on this website. And it says, we're going to, we pledge to drop

51:20 Python 2 support, you know, by sort of this date or this timeframe, and support Python 3 going forward.

51:25 And so there are a lot of projects that have signed that. And it's interesting, I thought going in that

51:29 bokeh was going to be maybe kind of a leader in this, I wanted to be fairly aggressive. But

51:33 all of a sudden, this year, a ton of projects have started releasing, you know, new releases that sort

51:37 of are cut off from from that. And so like things like I think Matt Potlib, and yeah, and I think,

51:42 you know, Dask, maybe and I forget what else, but there's all these sort of big projects that are

51:46 just suddenly, you know, we're never behind the curve, right? They've already dropped Python 2 support,

51:50 and we're sort of lagging behind. But I think it's time. I mean, bokeh is definitely used a lot in,

51:54 you know, analytics space. And I think things do move a little bit faster there.

51:57 Part of that is because of Conda and Anaconda and Conda Forge, they sort of push things forward.

52:02 I think also data scientists, you know, do a lot of exploratory work, and they're willing to sort of

52:06 move a little bit, you know, in that exploratory work, they're willing to sort of move and put up

52:09 a little bit more change to get new features and to, to get that level of performance better.

52:14 Once things get deployed, that's when things get a bit more sticky. And that's where you see a lot of

52:17 people still using Python 2 and, you know, finance and, you know, other venues like that.

52:21 Yeah, absolutely. I feel like these the data science exploration stuff and the models,

52:25 like the underlying technology is changing so quick there, right? Like TensorFlow has come out and

52:31 pandas and all these things are just changing so quickly that if you're going to come back to it,

52:36 you may want to just move to something new or shiny or better anyway. And you just it's much easier to

52:41 stay on Python on the later version of Python, whereas like, that website that that guy that used to

52:46 work here ran that now we just have to keep running like nobody wants to touch that, right?

52:51 As soon as you touch it, it's your problem to fix it if it ever has a problem. And nobody wants that

52:55 puppy, right? Yeah, yeah. So some of the companies that are projects that sign the Python 3 statement,

53:00 just Python 3 statement, the number three statement.org, TensorFlow, requests, XGBoost,

53:06 NumPy, IPython, like that kind of stuff, right? Cython, Spider. There's a ton of projects here.

53:12 Yeah, it's great.

53:13 A lot of those projects already have. I thought we'd be sort of leading the pack, but we're actually

53:17 behind the curve. And that's what made it very easy for us to say, okay, you know,

53:21 Q4, it's going to be really easy for us to drop Python 2 because all these other projects will

53:24 have already dropped Python 2.

53:26 Right, right. For example, Tornado is in there and you guys are built on that. So in a sense,

53:29 they're kind of calling your, not calling your bluff, but making sure you're going to have to

53:32 follow along anyway if you want to stay in the latest of that, right?

53:35 Yeah, even NumPy, right? I mean, you know, obviously, we could pin to a lower version of NumPy, but we don't want to do that.

53:40 Yeah, of course, you wouldn't want to do that. Interesting. So we're just about out of time,

53:43 but you want to talk about Portland real quick? Sure. Yeah. Yeah. So we were both in Portland,

53:48 right? You recently, somewhat recently, not super recently, but you're somewhat new here and you're

53:54 trying to get some stuff going in the data science space in Portland as well, right?

53:57 Yeah, absolutely. Yeah. So I've been here about a year and a half and it's been a really great

54:00 experience being here in Portland. I really love it. But I am trying to get a PyData meetup here

54:06 started. In fact, we have a first meetup scheduled for, I think, August 14th. And we're going to

54:11 alternate sort of between an east side and a west side, you know, downtown location, hopefully,

54:15 every other month. But me and a colleague of mine are getting that off the ground. And I'm really

54:20 excited about it. So PyData is this series of meetups slash conferences, if the meetups get big

54:24 enough, that is sort of sponsored by NumFocus. And, you know, I've been involved with NumFocus since the

54:28 beginning. I think it's a terrific, amazing organization. The people that are there are really

54:31 great. And I think the PyData meetups in particular have been really, really great, you know,

54:35 both meetups and also the PyData conferences are also really good as well. So really excited to get that

54:39 started in Portland. I was almost kind of surprised that it wasn't here already. There's a, you know,

54:42 there's a PyData Seattle meetup and there's PyData meetup. There's like 105 PyData meetups,

54:47 I think, around the world. So it was by far time that Portland gets one. So I'm really excited to

54:51 be helping get that off the ground. Yeah, that's awesome. I mean, that only really is interesting

54:55 to like 5% of the listeners, maybe. But it's still really cool that you're doing that here in

54:59 Portland. And, you know, other folks, they can create a PyData, their city's airport acronym if

55:05 they want, right? Absolutely. I think most places don't use the airport acronym,

55:08 but I'm really fond of PDX. So I invited a PDX on a good.

55:11 Yeah, it's definitely a good one. All right. Well, you know, Bokeh is a really cool project,

55:15 and I'm glad you all have been working on it. And it's great to see all this progress and excitement

55:21 around it. It's a great, great one. So people should definitely check it out.

55:24 Yeah, yeah. Well, thank you very much for having me. I love to have the opportunity to sort of spread

55:27 the word and talk about Bokeh. And it's been really great.

55:30 Yeah, absolutely. Let me ask you the final two questions before you get out of here, though.

55:33 If you're going to write some code, probably Python code, but maybe JavaScript as well,

55:37 I guess. What editor do you use?

55:39 I have been won over by VS Code.

55:40 Okay.

55:41 Yeah, I still use VI binding. So I grew up using VI and that's still in my fingers. And so I love

55:45 VI bindings. But I used to use Sublime Text, but I moved to VS Code and haven't looked back.

55:49 Yeah, that seems a pretty straightforward choice to go from Sublime to VS Code.

55:53 Those are, you know, one has so much more energy and they're super similar in their sort of workflow.

55:58 And then notable PyPI package, I'll go ahead and throw out there Bokeh for you. And what else? If

56:04 there's something like, hey, I ran across this and people might not know about it, but it's really

56:08 amazing. What would you say?

56:09 Yeah, let's see. Well, I'll say PyPI or Konda, right? So don't forget Konda. Very important

56:13 to remember. But, you know, it's hard to say. I'm so focused on, you know, using and working on Bokeh.

56:21 That's like my day to day. Honestly, I have a little bit of tunnel vision, maybe to put it one

56:25 way. But, you know, I think a lot of the tools that are built on top of Bokeh are really interesting to

56:29 me. And so I like, you know, looking at what is happening with them and seeing what developments

56:32 are going. Obviously, I think all the tools in the PyData ecosystem are amazing. I think Numba in

56:37 particular is really interesting. So Numba is a compiler for Python, lets you really accelerate,

56:41 you know, certain kinds of code. And it was originally created again by Travis, you know,

56:45 Oliphant, but it's been moved on since then. And it's actually grown really successful in certain

56:50 kinds of venues. So I think Numba is a pretty interesting use case. And I certainly, of course,

56:53 think Dask is really fantastic as well. Yeah, those are definitely good ones. All right,

56:57 Brian, final call to action. People want to get started with Bokeh. What do they do?

57:01 Yeah, absolutely. Love to get people involved. So if you want to, you know, talk about development or

57:06 have questions about support, we have this discourse, discourse.bokeh.org. If you want to just

57:09 get started from a very high level, just bokeh.org is a great one stop to get to a lot of other

57:14 resources like documentation, like the gallery, like the GitHub page, and just to see what,

57:19 going on. But in terms of like talking to us, yeah, the discourse is a great spot to make a

57:23 poster topic there. And of course, GitHub is a great place if you have ideas for suggestions,

57:27 you know, or want to report problems, of course, you know, GitHub is a great place to contact us.

57:31 Yeah, for sure. And PRs are accepted.

57:32 PRs are always accepted. Yes.

57:34 Yeah, very cool. At least considered.

57:36 Consider.

57:37 At least considered for sure.

57:38 Cool. All right. Well, thanks so much for being on the show. It's good to talk with you.

57:41 Absolutely. Thank you so much, Michael.

57:42 Yeah, bye.

57:43 Bye.

57:43 This has been another episode of Talk Python to Me. Our guest on this episode was Brian Vandevin,

57:50 and it's been brought to you by Ting and Linode. Ting is the fast mobile network custom built for

57:55 technical folks. Use their savings calculator to see exactly what you'd pay. Visit python.ting.com

58:01 to get a $25 credit and get started without a contract. Linode is your go-to hosting for whatever

58:08 you're building with Python. Get four months free at talkpython.fm/linode. That's L-I-N-O-D-E.

58:14 Want to level up your Python? If you're just getting started, try my Python Jumpstart by Building 10 Apps

58:21 course. Or if you're looking for something more advanced, check out our new async course that digs

58:27 into all the different types of async programming you can do in Python. And of course, if you're interested

58:31 in more than one of these, be sure to check out our everything bundle. It's like a subscription

58:35 that never expires. Be sure to subscribe to the show. Open your favorite podcatcher and search for

58:41 Python. We should be right at the top. You can also find the iTunes feed at /itunes, the Google

58:46 Play feed at /play, and the direct RSS feed at /rss on talkpython.fm. This is your host,

58:53 Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write

58:57 some Python code.

59:01 We'll see you next time.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon