Monitor performance issues & errors in your code

#222: Interactive graphs with Bokeh and Python Transcript

Recorded on Wednesday, Jul 24, 2019.

00:00 Michael Kennedy: Do you have data you want to visualize and share? It's easy enough to make a static graph of it, but what if you want to zoom in and highlight different sections? What if you need to rerun your machine learning model on the selected data? Then you might want to consider working with Bokeh. It does this and much more. Join me on this episode where you'll meet Bryan Van de Ven, who heads up the Bokeh project. This is Talk Python To Me, Episode 222, recorded, July 24th, 2019. Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter, where I'm @mkennedy, keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via @talkpython. This episode is brought to you by Ting and Linode. Please check out what they're offering during their segments. It really helps support the show. Bryan, welcome to Talk Python.

01:03 Bryan Van de Ven: Hi, thanks for having me.

01:04 Michael Kennedy: Yeah, it's great to have you here. I've often thought about ways in which I could use some of these cool Python visualization libraries, and I haven't recently had some great excuses to use them, so I haven't really covered them enough on the show, but I'm really excited to talk about Bokeh with you this week.

01:19 Bryan Van de Ven: Well, I'm super excited to be here. I think Bokeh has really developed a lot over the last year or so, in particular, and so, this is a great opportunity.

01:26 Michael Kennedy: Yeah, absolutely. Before we get to it though, let's start with your story. How'd you get into programming in Python?

01:30 Bryan Van de Ven: In Python, so I think the first version of Python I ever used is Python 1.4, actually, way, way back in the day and I was doing some system administration kind of job, so there was a lot of Perl. But I happened to get into using Python for a few things and it was a lot of fun. Put it down for a while, picked it up here and there, but I've been using it pretty extensively since about 2005 or 2006.

01:51 Michael Kennedy: Okay, yeah, those are pretty early days. The Python 1, right? We don't have to have debate about 1 versus 2 anymore. It's moved on to 2 versus 3.

01:59 Bryan Van de Ven: Yeah, I don't think there was every really much debate. Everyone was ready for Python 2, for sure.

02:03 Michael Kennedy: Yeah, absolutely. So how'd you get into programming in the first place?

02:05 Bryan Van de Ven: Let's see, the first thing I ever did was on a TRS-80 that was actually checked out from our local library. They had a program to check out TRS-80s for two weeks, and there was a Logo cartridge that came with it, so we could do Logo programming. A little bit later, we had some Commodore computers, and so I did BASIC, and I think at one point, I even got into 6502 Assembly, when I was getting to be a teenager or something. But, yeah, just 8-bit programming way back in the day.

02:30 Michael Kennedy: Yeah, how interesting. That's funny with the Assembly language, that's not a super easy. Like you've got BASIC on one side and Assembly language on the other, not a whole lot in between, huh?

02:39 Bryan Van de Ven: Well, there's not a lot of different ways to program on a Commodore 64.

02:45 Michael Kennedy: You had to earn your programming stripes, back in the early days.

02:47 Bryan Van de Ven: Right.

02:48 Michael Kennedy: That's for sure. Nice, okay, so Bokeh is a very visual thing. For a long time, you were at Anaconda, Inc. Is there a science background as well that got you sort of in that path or how'd you get interested in all these things?

03:03 Bryan Van de Ven: Yeah, I've had a pretty tortured academic background, but I went to school for Computer Science, then left for a while, and I worked in some research labs, then I realized, hey, I want to go back to school, and so I actually ended up in graduate school for Physics, eventually, and so, I have a pretty strong mathematics, physics background. But ultimately, I did decide to sort of go back into Computer Science, software engineering. I really like working on software though that's in the service of analytical endeavors or science and that sort of thing, and so this is why being able to work at Anaconda on all these tools has been really fantastic.

03:29 Michael Kennedy: Yeah, it's got to be super rewarding to have so much impact on the science side. Are you still at Anaconda? What are you doing these days? Like, what kind of programming and what do you do day-to-day now?

03:39 Bryan Van de Ven: Yeah, no, I actually just recently left earlier in the year, so I was at Anaconda from the beginning. I think I was the last original employee to leave, in fact, except for Peter Wang, of course, who's still there. But, eight years is a long time, and so, it was just time for me to go look for something different. I actually went to go work at Microsoft, and that was really on the strength of some interactions I've had with folks who I've devved with at Microsoft's, around Python, around open-source. Everyone there has been really terrific and really supportive of Python and open-source, so I think it's a very different company than when I thought about it 15 years ago, or I probably would've used M$ very sincerely on an angry forum post or something, but everyone there has been really terrific. It's been a good experience. And day-to-day, I work on Azure SDK for Python these days, which is a lot of PR reviewing, writing some code, and helping move the direction there.

04:23 Michael Kennedy: Oh, that's really interesting. You probably feel like you're bringing a little bit of the outside to Microsoft, right? It is a very different company. They're more open to external stuff. But historically, it hasn't always been that way, so it's probably like, let me tell about the Python scientific stack, folks. Things like that, yeah? What's it like?

04:40 Bryan Van de Ven: It's certainly interesting. There's a lot of give and take. So I've actually learned, I haven't been in an organization this large in a very long time, and so, it's been a lot of personal growth and learning for me just to be in that kind of environment, where people have to interact in different ways, and that's been very gratifying and helpful for me. But, definitely, I think I have a pretty useful perspective to bring as well, especially in terms of, yeah, data science, applications of in Python, and that sort of thing.

05:01 Michael Kennedy: Yeah, yeah, super cool. It sounds like a fun job. So let's start off this conversation talking about Bokeh by kind of getting the, like a big picture of making pictures with Python, right? So if I have a graph, I want to do a map, if I want to do some kind of bar chart or some visualization of data, what are my options nowadays?

05:23 Bryan Van de Ven: There are a lot. So if people want to Google, there's actually a chart made by Jake VanderPlas, who is very active in the PyData and SciPy community. He tried to draw a map, basically, of all the Python visualization landscape, and there are a lot of tools available, and some people think this is really great and there's a lot of choice, and then some people think that there's just too many things and they don't know what to deal with. But there are a lot of tools, so obviously, the Matplotlib is a very big tool that's been around for a very long time. It's a really fantastic tool, and all the devs there, they work really hard, and it's been great to see the sort of strive that its made in the last few years. In terms of web visualization, there's Bokeh, of course, Plotly is another offering that's out there by the company of Plotly. Altair is another tool that's been fairly recently added. Actually, Jake VanderPlas and Brian Granger from Jupyter put that together, and so it's inspired by the Vega plotting sort of toolkit that's available in browsers and sort of a Python wrapper for that.

06:17 Michael Kennedy: Yeah, that's cool. I've heard a lot of good stuff about Altair, and that it's really quite nice as well.

06:21 Bryan Van de Ven: Yeah, I don't have a lot of experience with it. I mean, it's definitely intended for very high-level exploratory data analysis, that's very useful, especially in notebooks in particular, and so, it looks very attractive from the things that I've seen. I'm very involved in Bokeh, and it takes up so much of my time that I almost don't have time to look into many other things very often. But, Jake's a fantastic guy, and Brian Granger, of course, is great and has made just amazing things for the PyData and SciPy communities so that's a great tool that they've put together.

06:49 Michael Kennedy: Cool, and there's always things like the JavaScript libraries, like D3 and stuff like that. Is that really relevant or are we kind of got a handle with things like Bokeh and Plotly and so on?

06:58 Bryan Van de Ven: Some people ask that a lot, like what's the difference between Python and D3 or why would you use one versus the other? And I think if you have people that are already using JavaScript, and they want to work on things with D3, D3 is an amazing tool and it can make incredible output and really fantastic graphics, and there's probably things that are doable in D3 that maybe would be more difficult in Bokeh, for instance. But where I think the sweet spot for Bokeh is we've really tried to make it so the people who are already very productive in Python, they're doing work and data science for science or using all of these tools that are in the PyData stack, NumPy and SciPy and Pandas and Scikit-learn and Dask and Numba and all these tools, they're really productive with these in Python. We won't let them have access to very interactive, powerful visualizations in the browser without having to reach for that JavaScript and Web Tech and sort of be distracted from the actual work that they want to do. And so, in terms of productivity, I think if you're already working in Python, I think Bokeh's a great choice. Put it that way.

07:51 Michael Kennedy: Yeah, well, Bokeh, to me, feels like I get a lot of the benefits of the rich JavaScript stuff, but that I don't actually have to make it in JavaScript.

07:59 Bryan Van de Ven: That's how great it is. Yeah, a very succinct way to put it, yeah.

08:01 Michael Kennedy: Okay, interesting, and then maybe we can talk really quickly about Plotly, just as a compare and contrast. So Plotly, like I don't fully understand Plotly. When I got to work with it, I feel like, okay, I'm working with a library, but then it seems like it has a backend that they provide that I have to deal with, and then there's also a commercial version. What is Plotly? I don't really know where it fits.

08:22 Bryan Van de Ven: So, in terms of their business, I actually don't know a lot. To be honest, I don't really follow that very closely, and so I think they've actually changed some of their offerings from what they use to be. I think they used to sell Plotly and I think they're not in that business anymore. But I can't really speak to that very carefully. But the main similarities are it's a Python API that generates a declaratives specification, typically some kind of JSON, that can be rendered by a front end library. Now, Plotly's an entire company centered around this, and so they have had some really nice resources for, I think, developing Bokeh, some things and ways that Bokeh hasn't had. Like I think their front end is a little more polished and some of the design stuff is definitely polished. I'd love to get some help on the Bokeh side to sort of bring that up to speed. But relatively speaking, I think we've done a pretty good job at having the same set of features. They're very contemporaneous. They started at almost the same time, way back in 2012 kind of era. They have a lot of similarities, but definitely there's a little bit of difference. My background comes from a lot of science stuff, so I'm really familiar with folks that have use cases for instance around like dense arrays, like big images, and so, we've really focused on some things, like having an efficient array protocol for the Bokeh server that can transmit large arrays, very efficiently. In other words, I don't think they've sort of gone down that route. Again, they've worked on some other features that are more around slick dashboarding and that sort of thing.

09:35 Michael Kennedy: Okay, yeah, interesting. Let's talk about the history of Bokeh. You're saying it that way, and I guess I am as well, trying to anyway. The proper pronunciation is?

09:45 Bryan Van de Ven: I used to say Bokeh, but boka, I think is also fine. Amusingly, long ago, when we were funded through this DARPA XDATA initiative, there was some video that someone made, unrelated to all the actual projects, but they made about the projects, and I remember, in that, they were describing all the projects that was on the XDATA initiative, and they mentioned the visual database Boke, I think. That's the only wrong pronunciation.

10:05 Michael Kennedy: Boke, it's not Boke, all right.

10:06 Bryan Van de Ven: Bokeh, it's fine, yeah.

10:08 Michael Kennedy: All right, excellent. And so, where did you get started? It started out of research grants in this DARPA funding, is that where it came from?

10:16 Bryan Van de Ven: No, the research grants really helped, but it started before then. Going back a little bit further, I've been interested in visualization for a long time. Actually, one of the first things I used in Python, the Python 1.4, was a plotting plugin for a Apache, and it amazed me that you could take data and have a website just to make a plot. It was incredible. I worked a little bit on BTK here and there, years ago. A few of my first open-source contributions were to BTK. But in the middle 2000's, I guess, I went to work for a company with Peter Wang, who eventually founded Anaconda, and I worked with him on a library called Chaco, which was a rich client library for interactive visualization. So instead of on the browser, you would write like a Python application using like a Qt or the GTK, kind of real application, and it was also contemporaneous with Matplotlib, but ultimately, Matplotlib won that battle. And the reason was pretty clear. It was because while Chaco had all this really rich capability for an activity, it was a very sort of fiddly API, a very detailed, kind of verbose. And so, later on, when Peter was getting Anaconda started, and I was onboard to help with that, we had talked about wanting to update this idea of Chaco and create a new library that supported interactive visualizations but in browsers 'cause browsers is the right place for it to be in 2012, right?

11:30 Michael Kennedy: Of course.

11:30 Bryan Van de Ven: And so we had this idea, we started it, but getting the funding through the DARPA XDATA initiative, in the early years of Anaconda, it was then called Continuum Analytics, really helped mitigate business risk for us, to put resources into it, so it really accelerated the development, I would say. We started talking about the ideas behind Bokeh in sort of middle of 2011, probably.

11:49 Michael Kennedy: Okay, yeah, it's pretty interesting. It's been around for a while. I guess it's, at that point, Ajax and interactive browser stuff's pretty well-established, right? So it's pretty clear that was at the right place?

11:59 Bryan Van de Ven: Yeah, I mean, we just knew that the future of presentation, the future of getting this content in front of people was going to be in browsers, so writing another rich client library was not something that was really interesting to us, and so, we definitely wanted to do the browser and then we definitely wanted to make it architected in a way that it was very flexible, that it had this declarative specification that described what you want it to visualize because that affords a lot possibilities. So we talked about the Python side of Bokeh, but you can actually have other languages drive Bokeh plots in the browser. There's an R Bokeh binding. There is a Scala Bokeh binding. It hasn't been updated in a while. I'm interested in actually reviving Julia Bokeh binding, but that's all because there's this JSON specification. Any language that can dump out the right JSON can create these Bokeh plots in the browser. Now we've spent a lot of effort investing in the Python bindings 'cause Anaconda is a big Python shop. That's sort of the possibilities there for other languages as well.

12:47 Michael Kennedy: Okay, so maybe it's worth just touching on the architecture a little bit and we can dive into the details more later. I guess there's a couple ways we can use this, right? Probably the most straightforward way is we have a Bokeh server and then there's some front end stuff that is the rendering point, right? I want to put some kind of graph in a browser and the server handles all the data, and maybe it'll only presents like a slice into that world of the data and things like that, right? Could you tell us about that?

13:18 Bryan Van de Ven: That's absolutely a great use case for the server, but I will say, the server is in fact optional, and I would say most usage of Bokeh probably doesn't involve a server, so Bokeh can generate this JSON or it can send it to a webpage that can be embedded in a like a Flask app or a Django app. It can be embedded in Jupyter notebook output cells, and it doesn't have to be connected to the Bokeh server, the BokehJS will take that JSON and render it, and you've got an interactive plot that has panning and zooming, and you've had linked behaviors between plots. You can have custom JS callbacks that do work whenever you make a selection or click a button. None of that requires the Bokeh server. What the Bokeh server's really great for is when you want to connect all those interactive features to real running Python code, like you want to click a button and have a Scikit-learn regresson or a Scikit-learn model run or you want to make a selection on a plot and compute a linear aggression line through those selected points with real Python code. That's what the Bokeh server's really great for, making that two-way connection between this front end and real running Python code. But you can use Bokeh very effectively without the Bokeh server, and in fact, I guess that I think most usage is probably, we call it, stand-alone usage, where it's just generating this pile of JSON that is used to drive a Bokeh plot in a webpage somewhere.

14:24 Michael Kennedy: This portion of Talk Python To Me is brought to you by, Ting. Let me tell you about Ting, a new mobile service available in the US. This targeted at developers and other technically savvy folks. First of all, their average customer only pays $23 a month, but they're no discount provider. Their service runs over T-Mobile's and Sprint's fast nationwide network. If you don't use that much data because you're usually on Wi-Fi like many of you are, then Ting will save you a ton of cash. But don't worry, you can still use as much data as you like for just $10 per gig. One mobile feature I use all the time is tethering, and with Ting, you get unlimited tethering at the same data rate with your account. $6 a month for a phone line, $10 a gig, $3 a month for text if you usually chat over iMessage or WhatsApp. Think about it, no contracts and super clear and fair billing. Visit python.ting.com, that's python.T-I-N-G.com, and check out their savings calculator. Enter your usage and see exactly what you'd pay. Use that link and you'll get a $25 credit to try them as well. That's python.ting.com or just click the link in the in the show notes. The server's mostly about the interactive bits if you want to add smarts to your plots.

15:32 Bryan Van de Ven: Yeah, absolutely. If you want to connect all those PyData tools, NumPy, SciPy, Pandas, Dask, Numba, and OpenCV, any of those tools. Do you want to connect those things directly to these interactive visualizations with a minimal amount of fuss? That is what the Bokey's server's for, exactly.

15:47 Michael Kennedy: Okay, cool. Before we move off the history and Anaconda, Inc. and all that, when you created it, it sounds like you tried to create it as a standalone project with its own fundraising and its own outreach. What was the thinking there rather than just making like part of Anaconda?

16:02 Bryan Van de Ven: Well, in the very early days, it was definitely a project that was started at Anaconda, and the DARPA thing came along somewhat serendipitously, not something we counted on or knew about when the company started, and that was the big sort of funding for a long time. After that ran out, 'cause that was a fixed number of years sort of support, Anaconda very generously supported the development of Bokeh. But ultimately, it was always the goal to try create these tools as self-sufficient, self-governing, push them out in the open kind of projects. And so it took a long time to get to that point. The first step in that was for Bokeh to become a NumFOCUS fiscally sponsored project, but of late, we've really ramped up the self-governance and the self-sufficiency. So pretty much at this point, I think the cord's been cut, and Bokeh's really out there. It's managing its own CDN resources. We're doing a lot of outreach and fundraising on our own right now, that wasn't happening even six months ago. We just had a, or still having actually, a July fundraiser going on to try to help pay for some of our infrastructure cost, but we're also ramping up some corporate engagements and trying to talk to corporations, see if they want to offer support, but that all is pretty new. For a long time, Anaconda was the primary beneficiary or benefactor, I should say, of Bokeh.

17:05 Michael Kennedy: Sure, and that's just makes Python and data science stronger, which is really the heart of Anaconda, Inc. anyway, so it seems pretty reasonable. You said that the project was a NumFOCUS project, and we spoke a little bit earlier about NumFOCUS a bit, and I guess my understanding, it's a little bit off. I saw NumFOCUS as a thing that kind of provides funding to these projects and that's not quite exactly right, is it? What is NumFOCUS and what did that do for you all?

17:33 Bryan Van de Ven: Yeah, it's sort of yes and no. So NumFOCUS was started by Travis Oliphant, who's one of the other co-founders of Anaconda. Of course, he's the author of NumPy, building on previous work. NumFOCUS is this non-profit 501c3, and its main role is to be an umbrella organizations for open-source projects, right? So open-source projects often are not legal entities, and so that actually makes it quite difficult for them to accept money, right? It sort of gets very complicated with taxes and. So they're this legal entity that can accept donations on behalf of projects, and then handle all the tax stuff. They also do a lot of outreach. They support PyData meetups and PyData conferences around the world. So they're this organization that sort of helps and supports open-source, and they do fund these projects in a sense, that they help the donation to come and get back to the projects, and also sometimes, they spread some of those funds around. They get bigger donations that they can also use to give to projects that don't necessarily raise their own money. But yeah, that's their main role, just to be this umbrella organization. They help out with the bureaucracy and take away that load from these open-source projects.

18:32 Michael Kennedy: Yeah, that's really interesting. I guess it took me a long time to realize that it's actually hard for companies to give money to these projects, right?

18:39 Bryan Van de Ven: It's really hard.

18:40 Michael Kennedy: It's not just a matter, put aside the point whether they should, put aside the point whether they physically could, of course, they, most of them, can and they should, but just the way that they're set up is they buy things.

18:54 Bryan Van de Ven: That is exactly right.

18:54 Michael Kennedy: They exchange the money for a service or a good. Even giving to a charity, like NumFOCUS, probably is a little bit odd and hard for them to do.

19:02 Bryan Van de Ven: Yeah, no, that is exactly right. There is just an impedance mismatch. I mean the sums of money that would help a lot of open-source projects I think are relatively small compared to the budgets of the companies we're talking about that are using these projects. But, yeah, there's just an impedance mismatch for how do you actually. It's not a purchase, and so it is not hiring someone, so what exactly is it? And companies just right now don't know how to handle that, don't know how to deal with it. And so there's some other efforts to make that easier. People are trying different things. There's Travis' company, Quansight, he's trying some models for getting support for open-source projects by engaging with companies in various ways. Tidelift as well is also trying that. GitHub is trying their new sponsorship sorts of things. So people are trying to find novel ways to attack this problem, and I hope we get there, but it's definitely something that'll take some time.

19:46 Michael Kennedy: Yeah, I hope so as well 'cause it would make a huge difference and it would be a blip for these companies to make that contribution, right?

19:52 Bryan Van de Ven: Yeah. I mean as an example, our goal for the July fundraiser was to raise $1,000, and we have now...

19:56 Michael Kennedy: That's very modest, right?

19:58 Bryan Van de Ven: I think several hundred thousand users, and we did. I'm really happy to report that we did, but I had to tweet out a lot everyday to make it happen. It was sort of knocking on doors and. But yeah, I'd love to be able to go to companies and try to be more effective and efficient at fundraising, if that makes sense.

20:12 Michael Kennedy: Sure, well, while we're on the subject, let's talk about your fundraiser real quick. If you can raise $1,000, that's probably not money to pay developers, right? It's something else.

20:21 Bryan Van de Ven: Yeah, I mean I actually raised this exact issue around the sustainability conference that NumFOCUS is going to have later in the year. I think there's sort of two different buckets where things go into. There's paying people, and I hope someday we can figure that out really well and we can support people to be maintainers of open-source software by actually paying them to do that work directly as a sort of job or a living, but there's also just the matter of, a lot of open-source projects now then could use a fairly small amount of money just to cover expenses or help take some strain and stress off the maintainers. In the case of Bokeh, we have to run this CDN to deploy BokehJS, so all the users around the world can BokehJS to display their plots. That is run on AWS Cloudfront, and so we have to pay for that. Someone has to pay for that, right? And so that's what this fundraiser was for, and so in the sense that it reduces my stress 'cause it helps me know that this is sort of taken care of for the next year. That's what that level of sort of funding's for. And there's other stuff too. It's good to get developers face-to-face sometimes, and so this could help with that, as well as other infrastructure cost.

21:18 Michael Kennedy: Right, having some yearly meetup with the core developers. It's odd, there's a lot of projects where the core developers have never actually met.

21:25 Bryan Van de Ven: Definitely, for sure. I think I've met everyone at least once, but there's some folks that are pretty scattered for Bokeh. But for sure, I've no doubt that's the case.

21:33 Michael Kennedy: Yeah, for sure. So you said that Bokeh's works with this JSON output, and it can just basically render. Anything that can generate to JSON, it can go out and render that. I think when I think of graphs, I mostly think of notebooks and Jupyter and things like that, but also we can just plug this in to whatever, is that right? I can plug it into just a Flask site, plug it into even like some kind of command-line app, and somehow pop it up.

21:58 Bryan Van de Ven: Yeah, I don't see why not. I mean anything that can run JavaScript basically, anything that can load BokehJS. So if that's like an Electron app, I think that would be feasible, certainly in the Jupyter notebook, but there's a variety of ways. We have a whole bokeh.embed, API, so there's a variety of different ways to embed Bokeh content. But if you're running a Flask site, you can use one of those ways to pop up a Bokeh content in the middle of your site and in your template and wherever you want to put it.

22:21 Michael Kennedy: When I think about the data science base, there's some libraries and data structures that are just used over and over and over. You mentioned Travis before, so there's NumPy. There's of course Pandas. There's a bunch of other stuff built on top of that. Is there special integration for those types of libraries? Like if I already have some NumPy array or I've got some Panda DataFrame. Is there a way to just plug it into Bokeh?

22:45 Bryan Van de Ven: Yes, so Bokeh works really well with all those things. NumPy is an actual requirement, runtime requirement, for Bokeh. We tried to avoid that. Even in the beginning, we wanted to make Bokeh as minimal and as accessible as possible, but NumPy is a requirement now. Pandas is not a hard requirement, but Bokeh works really well Pandas. If you have Pandas DataFrames, you could basically plug them at anywhere a Bokeh ColumnDataSource would go. It can automatically convert them or use them in a way that's useful, either DataFrames or group by objects. And so, we tried to make it very easy to integrate with Pandas but also make Pandas not required, so that's sort of the state, where things are at. Anything that can sort of behave like a list or an array or Pandas Series works pretty much out of the box with Bokeh.

23:21 Michael Kennedy: Okay, are there other major libraries that I should, that I don't know how to ask about?

23:25 Bryan Van de Ven: Yeah, I think Bokeh works pretty well with Dask, I assume, 'cause Dask has a DataFrame like API. Matt Rocklin actually used Bokeh to develop the interactive dashboard that's sort of the cluster monitor for Dask.

23:35 Michael Kennedy: I think it's great as well, and I did have Matthew Rocklin on before to talk about Dask, but I don't know if everyone's listened to that one, and also if they've seen the actual visualization of Dask, so could you maybe describe that real quickly, what Dask is and then that dashboard because when I saw that that just like blew me away.

23:53 Bryan Van de Ven: Yeah, Dask is a tool for basically parallel distributed programming, and it's trying to do so in a way that is a very sort of Pythonic, very Pandas-like, API, right? So there's other tools that do this sort of thing, but they come from other languages originally, and so their APIs are maybe not very Pythonic and they kind of don't fit in well with the Python tools. But Dask is meant to be a very Pythonic tool for this distributed computing task, and so, today, and it has this dashboard that Matt Rocklin developed using Bokeh. They can visualize everything that's going on around the cluster that's doing computation all at once. So it can show you what nodes are computing or waiting or they're transferring data in real time, and so that's really helpful for diagnosing problems with parallel distributing computations, and so Matt has always been very clear with it. Bokeh was great for him because he didn't have to write all this JavaScript to have this really interactive dashboard. He could just write it in Python and connect it directly to his telemetry if he's getting back from clusters, and visualize it very quickly. And so I think it's been a great tool for his users. I shouldn't say his users. It's a big project now. Dask has actually grown up quite a bit, so I should say the Dask project users, but we're happy to see that kind of thing happen. I'd love to see Bokeh used in those kind of use cases.

24:54 Michael Kennedy: Yeah, and it just looks so good and professional and live updating, it's really a nice use case for Bokeh, and I think it's also a good testament to what you guys have built.

25:05 Bryan Van de Ven: Well, it's also good to get feedback from real use cases like that. Nothing sharpens your tools better than sort of having them honed against real problems, and so we love when people do awesome things with Bokeh and tell us, hey, this was great, but also, this could be a little bit better or easier, I've decided to make my life simpler, or here's some pain points I had, that kind of feedback is really helpful from users.

25:23 Michael Kennedy: Yeah, it's great to design something but once it actually meets real users and real use cases, that's where it gets real. So you talked about Dask. Let's just touch on some of the other things that is built upon Bokeh because Bokeh's been around since 2011, 2012, like you said, and it's pretty stable, for the most part.

25:44 Bryan Van de Ven: It's a lot more stable recently, yeah.

25:45 Michael Kennedy: Yeah, that's great, so things are starting to build on top of it like Dask and just using it, so what else is out there like that?

25:50 Bryan Van de Ven: Well, there's a couple different things and different classes of things, all right. So first of all, there's other libraries now that are starting to build on top of Bokeh's, so there is Chartify, which was created by the data labs at Spotify, and so it's their high level data science, opionionated data science API on top of Bokeh. There's a project called Pandas Bokeh that just came out recently that's sort of very tight integration with Pandas and using Bokeh to turn interactive plots. There's also a set of tools created by some folks who are still at Anaconda called, the effort's called Pyviz. And so there's a tool called HoloViews, which is very data-centric API and it can generate interactive visualizations, using Bokeh and other tools as well. But there's also some tools like Datashader which are for very large data. You can finally control how to render them. So you can combine Datashader with Bokeh using HoloViews, so it can drive things that are very high level. So I love to see this effort where people are building these things on top of Bokeh, and I'm also glad that now. Bokeh was a moving target for quite a while, and I very much appreciate the patience of all of our users who kept with us as we were figuring things out. But I would like them to know that we're trying to be much more stable now, and I think we've done a very good job, since at 1.0 it was released at being much more stable. That's very good. The other kind of things that get built on top of Bokeh are more like applications or other projects. So there's a project called Microscopium that is for biosciences research. There's a tool called Lightkurve that a bunch of astronomers put together which uses Bokeh to let you drill down so you can see an image of some stars, something. It'd hover over a single pixel and really drill down something about that image using all the tools that are in Bokeh. Dask, of course, is a great example, and then there's actually a bunch of other ones on GitHub. Sorry I don't have a list at the top of my head, but there are a lot of excellent bioscience projects that are built on top of Bokeh and financial trading is another thing that comes up. People have done drug discovery type work on Bokeh. It's interesting where things are popping out, especially now that I think we're more stable, I think it's really a great time for that to happen.

27:44 Michael Kennedy: Yeah, that's super cool. The Lightkurve project looks amazing. I mean to explore data from the Kepler and TESS telescopes that's pretty cool for exoplanet discovery.

27:57 Bryan Van de Ven: It's really something.

27:57 Michael Kennedy: That's got to feel good, right? Like the project that you worked on is helping scientists actually look for exoplanets. That's incredible.

28:04 Bryan Van de Ven: That is, no, it's really gratifying. That's exactly the sort of thing that, I'd say, we wanted to be able to enable and help make happen, so it's really gratifying when people are able to use Bokeh for those kinds of situations.

28:15 Michael Kennedy: Yeah, absolutely. Do you see Bokeh being useful or appropriate for real time dashboard type of scenarios? I mean, obviously, it draws great graphs, and then we talked about Dask and the real view of that. Just imagine I'm building software for stock trading company or something like this, and I want to show in real time what the market's doing, all the data the traders need. Would that be appropriate or is it too a little or too high latency or what's the story?

28:44 Bryan Van de Ven: I think it could be. There's real time, and then there's quote, unquote real time, so it's always depends on what exactly you mean when they say all that.

28:49 Michael Kennedy: Sure.

28:50 Bryan Van de Ven: When I hear the word, real time, I have images of very low-level, a certain kind of system that's implemented that I have very specific guarantees about its performance, and for that kind of work, Bokeh's probably not suitable, right? But if you're talking about real time, just in terms of streaming data coming in from a financial system or from IoT type devices or that sort of thing, then I think Bokeh can or has been useful. There's certainly people who have come looking for support on our Discourse or on our other support forums, talking about using Bokeh, connecting it to real time sensors, for instance, so they're a factory or a warehouse, and they've got data coming in, want to visualize something. So people do do that. So if you mean, quote, unquote real time, and just in the sense that I've got data coming in and I want to visualize it in a best ever kind of way, then I think Bokeh's definitely a good choice.

29:33 Michael Kennedy: Yeah, that's mostly what I meant. I mean, obviously, not like, we need sub millisecond response time or else the plane crashes, something like that.

29:39 Bryan Van de Ven: Right.

29:39 Michael Kennedy: Nothing like that, right? But I think when you're thinking of graphs, a human has to see the graph and interpret the graph, so that's, how long is that? It can't be quicker than 100 milliseconds, right? Humans can't understand graphs that quick.

29:52 Bryan Van de Ven: Well, and so real time's not even necessarily about quickness. It's really about predictability. It's about specific kinds of guarantees. But, yeah, so I should I mention, yeah, Bokeh does have some APIs for streaming, specifically, so if you've got data coming in or you want to update just the newest point. You've got a time series of 100,000 points, plot it, and you've got new points coming in at the end. Bokeh can very efficiently just send the new data without sending the entire dataset, so it is very useful for that sort of thing.

30:17 Michael Kennedy: Yeah, so then you load up the historical data, then you just take the update every half a second or something, that sounds pretty doable. Yeah, absolutely. Okay, yeah. Yeah, yeah, yeah, sure. That sounds super interesting. One of the capabilities kind of around that that you talked about is being able to work with quite large data, maybe having some on the server or something like that, and then either interpreting that or running a machine learning model against that or something like that. Maybe tell us some of the use cases there.

30:46 Bryan Van de Ven: Yeah, there's a couple different ways you could use Bokeh for that. So one is that if you have a large dataset, you're not going to send a billion points onto your browser or your browser will just fall over, so you've got to find some ways to sort of minimize the data, and that can be done in a variety of ways, and one of the ways is, for instance, downsampling. If you have a large dataset and you've got some reasonable way that makes sense for your use case to downsample, the Bokeh server can just do that downsampling and then show you the subset of data that's relevant, right? That's one way that you could use the Bokeh server to handle sort of large datasets. Another way is, I mentioned this tool of Datashader, which is actually specifically designed for being able to ramp very efficacious visualizations, images of hundreds of millions or billions of points. They give you very fine control over the way, basically, the more sophisticated version of alpha compositing happens, so you can actually try to emphasize some things you want to emphasize in a meaningful way. And so, you could use Datashader to datashade those hundred million points, and then that just produces an image and then you can send the image to Bokeh, and so that's a very fast operation. So the Datashader's sort of a form of data compression in that sense, but it's still very interactive because you can use the events that Bokeh generates to if you resize a plot to get a new image generated, if you, for instance, change the balance of the plot. If you pan or zoom, you can get a new datashade image based on those new, on those new dimensions, and in fact, HoloViews sort of does that all automatically for you. You can do it by hand with Bokeh and Datashader or at a high level, Holoviews can sort of take care of that for you. That's another way in which you can sort of reduce the amount of data that you're going to send into the browser. Coming up next year, I hope we can actually raise the ceiling of the number of points you can send. We're going to try to do some work, hopefully, to better improve the WebGL support in Bokeh, and maybe even just have Bokeh be based entirely on WebGL, in which case, I think we could. Right now, Bokeh, you could send a few hundred thousand points to it and it's typically okay. But I think we could raise that sort of ceiling a little bit higher once we are able to render completely on WebGL.

32:29 Michael Kennedy: Yeah, that would be pretty amazing. Is it used in like canvases or something now?

32:33 Bryan Van de Ven: Yeah, exactly, so it uses the HTML canvas. There is currently some level of WebGL support, and the person who maintained that and originally wrote that has just, he's moved on to other things, and so that WebGL support is sort of, it needs a little work, a little love and care, and we'll probably just go ahead and try to do things from the beginning and refound that in a cleaner, better way going forward. But, yeah, there's some WebGL support now, but most of the rendering happens on HTML canvas.

32:56 Michael Kennedy: Okay, so let's talk a little bit about some of the internal implementations of this. When most people interact with Bokeh, they're probably interacting with some Python API, and as far as they're concerned, that's the end of it, right? Like I call these functions, the plot comes up, magic.

33:15 Bryan Van de Ven: Yeah, and actually the API is actually quite light, so by and large, we've turned the problem of creating an interactive data visualization web app into the Python problem of creating a bunch of objects and setting their properties, right? I mentioned this JSON representation, that JSON representation actually mirrors on both sides a set of objects, and those objects being a graph, so there's a set of objects like a plot which has a bunch of renderers. It has a couple of axis and some ranges and some data sources, maybe that's in a layup, but also, it has some buttons, so there's some objects we call the models that represent all of those items. All of those could turn into JSON. And then on the JavaScript side, there's a one-to-one correspondence basically of objects that those get turned into, JavaScript objects. The role of the Bokeh server is just to keep those two sets of objects in sync bi-directional, right? But in terms of what you use from Python, you create this plot, maybe use the figure function, which sort of puts a lot of these objects together for you in a convenient meaningful way, and then you can change their properties. You can change the start and end of a range or you can add the data to the data source or you can change various properties of a circle glyph 'cause you want to change how it appears, and so, all of this is just setting these properties on these Bokeh models, and that is the main thing that people do, I think. Apart from that, you might be writing callbacks. If you're using the Bokeh server, you can write callbacks in Python for if a button gets clicked or a selection is made, you can run Python code. But you can also create JavaScript callbacks for these standalone case where I don't have a Bokeh server, but I still want something to happen when a button gets clicked or the selection is made. You can write a little snippet of JavaScript and that will do that amount of work. And typically, those callbacks, the end result of that is, again, setting some properties on these objects. They might update the data source which causes the plot to update or they might change the range bound which causes the plot to zoom out. That sort of thing. So there is definitely API. There are functions. There are functions for embedding. There are functions for showing things in notebooks. There are functions for creating plots to start with. But most of the content of the Bokeh library is these objects we call models, and they all have these typed properties that you can set values for, and that's the main interaction mode.

35:10 Michael Kennedy: Yeah, so very declarative in that sense, right? You set the aspects or the features that you want, and it just figures out how to make that interactive.

35:18 Bryan Van de Ven: Yeah, exactly, yeah.

35:19 Michael Kennedy: Nice, so it sounds to me like listening to you talk, there's a lot going on with JavaScript here, even though the typical consumer user of it, the developer, doesn't have to care or work with it. What're you using there? What was the history? Was that always just straight JavaScript or what're you doing?

35:36 Bryan Van de Ven: Yeah, it's actually never been straight JavaScript, but you're right. The bulk of the work, the Bokeh's actually in this library of BokehJS, which is JavaScript library.

35:43 Michael Kennedy: How big is it?

35:43 Bryan Van de Ven: Minified, I think the main core library is about 600K. It's a pretty hefty library. Well, we're trying more...

35:49 Michael Kennedy: That's a pretty hefty library, yeah.

35:49 Bryan Van de Ven: It is. We're looking to make things as optimized as we can. We definitely could use help from more experienced JavaScript developers. When Bokeh started, it was started by, it was me and a few other folks who were working and none of us had, I think, a lot of front end experience. I didn't have any JavaScript experience when this project started, and so we actually chose CoffeeScript at the time. And so now, I think it was maybe a good choice for the time because it allowed us to iterate very quickly and sort of make mistakes more quickly, I guess.

36:13 Michael Kennedy: Mm-hm.

36:13 Bryan Van de Ven: You try out things. It's sort of like Python, looking like, as one of these transpiled languages that turns into JavaScript. But ultimately, once the project grew very large, it wasn't really suitable for that, and so we actually did a large effort. Most of that work was done, every little thing was done by one of our core contributors Mateusz to port Bokeh to TypeScript, and that's been a huge win for the project. I mean just in doing the port to TypeScript, a lot of latent bugs and problems are uncovered. Certainly, since it's been done, I've been prevented from checking in things that would've been an error by the TypeScript compiler. So I'm a big fan of that and glad for that. There's certainly new contributors who find it a little bit more difficult or daunting sometimes to work with TypeScript, so there is a barrier to entry. It's a little bit high for Bokeh, and that's actually just in general been a problem for us, I think, to attract sort of contributors on that site 'cause Bokeh is targeted towards Python developers with the promise that they really don't have to worry about JavaScript if they don't want to. But all the work's actually in JavaScript, and so it's. We need JavaScript developers to come help make Bokeh better and so, for the most part, and so that's been a challenge for us a little bit.

37:13 Michael Kennedy: You have this bimodal distribution of skills and desires and stuff.

37:17 Bryan Van de Ven: Absolutely.

37:17 Michael Kennedy: I think the Python folks and the JavaScript folks and, yeah, its...

37:21 Bryan Van de Ven: So we're trying to make Bokeh itself like a. There are people who use the Bokeh JavaScript library just by itself, as a JavaScript library. I would say that, from my perspective, quite a bit of work is needed to do to make that a serious sort of contender for something people want to use, but we definitely would like to get that done and we'd love to get help doing that. I think making BokehJS as a sort of first class JavaScript library on its own would be very helpful for our project and certainly, it would be great to get a community around that as well, but that's a longer term goal.

37:47 Michael Kennedy: This portion of Talk Python To Me is brought to you by Linode. Are you looking for hosting that's fast, simple, and incredibly affordable, well, look past that bookstore, and check out Linode at talkpython.fm/linode. That's L-I-N-O-D-E. Plans start at just $5 a month for a dedicated server with a gig of RAM. They have 10 data centers across the globe, so no matter where you are or where your users are, there's a data center for you. Whether you want to run a Python web app, host a private Git server or just a file server, you'll get native SSDs on all the machines and newly upgraded 200 gigabit network, 24/7 friendly support, even on holidays, and a seven-day money back guarantee. Need a little help with your infrastructure, they even offer professional services to help you with architecture, migrations, and more. Do you want a dedicated server for free for the next four months? Just visit talkpython.fm/linode. It's interesting that you found TypeScript to be a nice way of working and whatnot, and I find it'd be pretty nice as well. Certainly, if I had to choose between CoffeeScript and TypeScript, I would definitely choose TypeScript. I think TypeScript is interesting in that it's a super set of JavaScript, so all your regular JavaScript just works, but you can typify it and make it have other features and capabilities that that language brings, and that's a pretty interesting way to approach that problem.

39:07 Bryan Van de Ven: Oh, definitely, yeah. And to be clear, I think CoffeeScript was the right choice in 2012. It's not at all the right choice for anything, I don't think, in 2019. I think at the time, Bokeh was one of the largest CoffeeScript libraries probably ever developed, which is interesting, sort of a bit of trivia, but, like I said, it let us move fast, especially not having a lot of experience in front end dev. But after time, we just needed something more, a little more serious, for lack of a better word.

39:31 Michael Kennedy: Yeah, sure. I'd just like to get your thoughts real quick. So TypeScript is all about sort of static typing and checking and whatnot of your code, and we kind of have that in Python a little bit now to the extent that people want to bring it in with mypy and type annotations, but it's not really the main sand of the language of Python. What are your thoughts of working into these two languages kind of side by side on the same project?

39:56 Bryan Van de Ven: Well, so this is really interesting question. There is actually history of various projects that had, what's called, a manifest typing to Python, and so that goes back to. There's definitely a project called Traits that Joseph Morrill created, that was sort of. You could add types to classes, and those would get checked at runtime, and you could also do things like reactive programming and event-based programming, based off changes to those values. Traits auto-created like Qt GUIs, I think, from classes as well, little panels. And there's another one called Param, and I think there's now one called Struct, but Bokeh also has its own properties, I might mention, these properties of models. Bokeh has its own properties system, which is rooted in a bunch of fun metaclass programming that lets you add these declarative types. So the actual models I mentioned for Bokeh objects are typically have no code in them. They're just classes with these propertied definitions that say, oh, my plot width is an int or my source property is an instance of a column data source or the range has two floating point values start and end. And so, we're able to provide runtime feedback if people try to set the range at start equals some string value, we say, hey, that's not an appropriate value. It needs to be an integer. And we also know what properties are on objects, so a feature people really compliment about is, people sort of fat-finger a property name, we'll actually give a suggestion and say, the nearest property names are named this.

41:09 Michael Kennedy: That's nice.

41:09 Bryan Van de Ven: And so we, kind of have kind of a type system. Yeah, they're really a nice feature. It's sort of one of the simple things you don't think about until you see it. But we've had a type system in Bokeh since the beginning, and so it's so interesting now that mypy is becoming more popular. We are interested in looking to use mypy, basically after Bokeh 2.0 comes out we drop Python 2 support. We're interested in trying to get mypy wherever we can. I think it's a useful tool. I hadn't used it much until recently, but I have seen it used to good effect, and so, I'd like to try to improve that. I don't know how much we'll be able to use mypy to replace our existing sort of type properties system 'cause that would be a huge endeavor 'cause our properties, they aren't just the type checking, they also plug into our documentation systems, so we can auto-generate our reference documentation.

41:49 Michael Kennedy: Wow.

41:49 Bryan Van de Ven: And of course, all the auto-synchronization is based on this too, right? A lot of the machinery for the automatic synchronization and serialization is based off these property definitions, and so...

41:57 Michael Kennedy: Right, like to notify that something has changed to people who are interested.

42:00 Bryan Van de Ven: Yeah, exactly.

42:02 Michael Kennedy: Things like that.

42:02 Bryan Van de Ven: Yeah. So replacing it with mypy is not something I'm sure we can do for the properties, but there's plenty of other places in the library where mypy would be a great benefit to help us tighten things up, and so we're looking at that after Bokeh 2.0.

42:13 Michael Kennedy: Okay, yeah, yeah, very cool. Maybe we could do a quick tour of some of the interesting graphs or visualizations that you find kind of interesting and worth talking about. Over at demo.bokeh.org or just bokeh.org, and just click on the gallery and demos and stuff. There's a bunch of cool ones that has the source code. There's some interactive bits and so on. You want to tell us about something you think are worth checking out.

42:36 Bryan Van de Ven: Yeah, so for sure. So first off, if you go to demo.bokeh.org, these are all specifically Bokeh server applications, so these are all backed by a running Python process, and when you click a button or make a selection, that triggered real Python code. If you got the gallery on the docs, most of those are standalone, and so, they aren't backed by Bokeh server, just to give that distinction of the way. But at demo.bokeh.org, there's a couple of interesting ones here. The first one in the upper left is this movie data explorer, and this is actually a fairly direct comparison, intentional on our part, to a tool called the Shiny Movie Explorer. So people asked, for a long time, where is Shiny for Python? Shiny is this tool for creating interactive data visualization applications from the R language. People ask, "Where is Shiny for Python?" So we're trying to answer that question, and I think Bokeh's a pretty good, decent answer to the question of where is Shiny for Python. So we made that as a pretty direct comparison. So that's one that's interesting. Right next to it, there's this selection histogram, which I think is pretty cool. It's got a couple of distributions of scatter points on a plot, and if you make a selection, it shows the histograms on both axis. And if you make a selection across those points, a subset of those points, it then highlights and shows you the histogram of just the selected points, and then in the opposite direction that you select, the histogram of the unselected points and sort of a shadow faded out version.

43:42 Michael Kennedy: Wow, yeah, that one's really cool.

43:43 Bryan Van de Ven: That's a cute one. We've been working on that one for quite a while. It's gone through several iterations. That actually helped us uncover some problems with the Bokeh server early on. It was just sort of behaving in a weird way and stuttering and realized that things were sort of boomeranging, sort of making a boomerang effect, and so we had to fix that out, but that was a great example to help us figure out some of those problems. And we have a lot more things under a lot more rigorous tests now, so that's good, but, yeah, I like that example a lot. Another one we have is this reproduction of the Gapminder demo. Hans Rosling did this famous TED Talk where he showed all these data, and so we've reproduced that in Bokeh. We've also embedded the YouTube video. We want to be able to show being able to use a template to embed Bokeh content and a template with other contents. So this also has this YouTube video embedded.

44:23 Michael Kennedy: Yeah, the talk by Hans Rosling, you have the video there, it's really worth watching. That guy...

44:30 Bryan Van de Ven: Absolutely.

44:31 Michael Kennedy: Really makes statistics and just data relevant for humanity, it's in a great way.

44:37 Bryan Van de Ven: Absolutely, no, I recommend anyone to go watch the video, regardless of whether they look at the Bokeh part. It's a great video, and I think it's a really compelling one. Tells a great story, so I'd recommend anyone to go check that out.

44:48 Michael Kennedy: For sure.

44:49 Bryan Van de Ven: Let's see, lower left, there's actually a financial chart, so here, you can have time series from two sort of financial data sets, and you can do a cross correlation between them and you can see the Pandas statistical summary there and you can use the dropdown to choose different time series and then the table updates and the data plots update. So that's a nice one as well. And then on the bottom right, there's an interesting one that's got this 3D plot. And this is maybe confusing for some people. Bokeh itself is not a 3D plotting library. It has no inherent 3D capability built-in, But Bokeh is very extensible. At some point, we realized that lots of users have use cases that are imminently reasonable and really cool that we're just not ever going to have the capability or resources to do in the library. I mean, you have to sort of limit the scope of the core library at some point. So we worked to make Bokeh extensible, and so you can create these custom extensions that behave just like built-in Bokeh models, and they plug in just like the plot object or a widget object into Bokeh content, and so this is an example of that. And so this is a custom extension that wraps a little 3D JS library, and you use the standard Bokeh data sources, and you update them, and then this little 3D plot updates because, basically, the custom extension just wires together the Bokeh data source with whatever this other library expects, and so, it's a really neat example of that. And there's other examples of extensions in the docs as well for different kinds of use cases if you have some really cool JavaScript widget you want to connect to Bokeh content. Actually, if you have a cool JavaScript widget that you want to connect to all these PyData tools, like you want to connect this JavaScript widget to Scikit-learn or to Dask or to Numbo or Pandas, Bokeh is a great bridge for that. You can just write a custom extension that wraps the JavaScript component and then it's automatically, a Bokeh server can connect it into all those tools.

46:25 Michael Kennedy: Yeah, and get all the change notification and interactivity and everything, yeah.

46:29 Bryan Van de Ven: Mm-hm.

46:29 Michael Kennedy: That's super cool. Okay, let's see. What else do I want to talk about? You will have Bokeh 2.0 in the roadmap. What's going on with that?

46:37 Bryan Van de Ven: Yeah, absolutely. I would say, we were targeting August, but I think maybe a little more realistic at this point is September. We're always a little optimistic in our estimates for schedule.

46:45 Michael Kennedy: Welcome to software development, right? That's how it goes. We're all like that...

46:49 Bryan Van de Ven: I don't even want to speculate the first time we promised Bokeh 1.0 and stability. That was probably a couple years too early. But we're a little more on track for Bokeh 2.0. But the main thing about Bokeh 2.0 is just that we are dropping Python 2 support and also Python 3.4 supports. So Python 3.5 will be the minimum version. As long as we are doing a major version mod, we're also going to take the time to clean up a few other minor things. So there's a few minor changes that are coming. Hopefully nothing that's too disruptive for anyone. We're going to be sure to outline and document all those in the Migration Guide. But that's the main thing, it's the Python 2 support, and it gives us a chance to do some things like move to native coroutines. So we use Tornado as the base for the Bokeh server, but if we move to Python 3.5 as a base, we can use native async and await coroutine's everywhere. Still with Tornado, but it helps us clean up the code a lot, and just in general, it'll help us clean up the codebase and make it a lot more maintainable, to shrink it. It's always good to delete and shrink out for sure.

47:39 Michael Kennedy: Yeah, if you maintain it by deleting it, you're good. That's a good way to do it. Do you think that'll help attract more maintainers? Should say like, hey, you could work on this cool asyncio, async and await library, rather than, you know this thing called Tornado, and then these coders...

47:54 Bryan Van de Ven: Well, they use Tornado. Tornado is a really great tool, but I think it may. It broadens the thing a little bit to hopefully some more developers, and there are. I stress that there's a lot of work in BokehJS, but there's plenty of work on the Python side to do as well. We'd love to have contributors in. And honestly, there's actually a lot of work that's not coding. I'd love to get other contributors involved in all kinds of ways. If I can speak a minute about that, I mean.

48:14 Michael Kennedy: Yeah, go for it.

48:15 Bryan Van de Ven: Obviously, people talk about, hey, we need testing help and docs and design help or docs help, and that's certainly true for us as well. But other, maybe, ways we don't think about is we'd love to get designers, front end designers, to come help make our assets better, to come help us improve the visual appearance of Bokeh 'cause we've done okay, but we're not designers, and so it'd be great to get that kind of help. We actually have a lot of infrastructure now on places like DigitalOcean and AWS, and it would be great to get experienced people know those systems and those DevOps on those systems to come help us optimize them for cost, optimize them for usage, whatever.

48:48 Michael Kennedy: Yeah, you guys are doing cool stuff with Docker, right?

48:50 Bryan Van de Ven: Yeah, we do a couple things with Docker. So we run the demo site. It's actually a Docker image that's run on Elastic Beanstalk and I actually just recently changed some of the instances that was running on to hopefully make them a little bit more cost-effective for us. But we also just recently had a spike in S3 usage on one of our buckets that I couldn't really explain just yet, and so I'd love to get experienced people that can help with those sorts of things. Outreach is another area. We're really trying to ramp up our outreach posts to the community in terms of fundraising, but also talking to companies, and we've had a couple people help with that, and actually, just offering support. We just moved our mailing list to a Discourse instance, discourse.bokeh.org, which is infinitely better. I mean the Discourse is great for users because there's a lot of features for code highlighting, for map text, just all kinds of things we could imagine. Maybe putting a extension to put in actual Bokeh content maybe as Discourse post. But it's also great for us as maintainers because Discourse has a lot of information about what are people searching for, what topics are popular, that sort of thing, as that helps us know maybe where attention needs to go, but just answering questions there. People want to go offer support and help other people use Bokeh. That is also a huge deal. I think Bokeh's been successful 'cause we've had a few people be able to put a lot of time into helping the community. But as a the community grows, that's got to scale. It needs to have more and more people helping each other, and so that kind of thing would also be a great way to contribute to the project. And so, there's all kinds of ways people can plug in, and we'd love to engage with anyone, really about any of those tasks.

50:08 Michael Kennedy: Right, right. If you're a designer and you want to make the website look shiny, that'd be great. If you're, want to make the graph look better or maybe you're a visualization expert and you've got a different kind of graph you want to bring, whatever, right?

50:21 Bryan Van de Ven: Yeah, absolutely, or just even making new examples for the docs, making really cool uses of Bokeh to showoff, to tweet about, to put on our docs in our gallery. I mean there's all kinds of ways to make very valuable contributions to the project just because there's a lot of things to do and presently, not enough people do them, probably never enough people do them, right? But obviously, the more help we can get, the better.

50:39 Michael Kennedy: Sure, so if I could summarize. You're willing to accept contributors to the project.

50:43 Bryan Van de Ven: Yeah, absolutely. If I hadn't made that clear, yes, so I would love to bring.

50:47 Michael Kennedy: That's awesome, yeah. It's a cool project and will be fun to work on. As part of this Bokeh 2.0 thing and the dropping of Python 2, which I like to refer to as legacy Python, and Python 3, just as straight Python, but as part of dropping legacy Python, one of the things you did, this is kind of a trend in the data science space, I haven't seen it as probably adopted and I'm not really sure why, but you signed the Python 3 Statement. You want to tell folks about that?

51:12 Bryan Van de Ven: Yeah, so the Python 3 Statement is just, it's a Github repository where projects can go and sort of make a PR to list themselves on this website, and it says, we pledge to drop Python 2 support by this date or this timeframe, and support Python 3 going forward. And so, there are a lot projects that have signed that, and it's interesting. I thought going in that Bokeh was going to be maybe kind of a leader and that's, I wanted to be fairly aggressive, but all a sudden in this year are a ton of projects, have started releasing new releases that are cut off from that. Thing like, I think, Matplotlib and I think Dask maybe and, I forget what else. But there's all these big projects that are just suddenly we're narrowed up behind the curve, right? They've already dropped Python 2 support, and we're sort of lagging behind. But I think it's time. Bokeh is definitely used a lot in the analytics base, and I think things do move a little bit faster there. Part of that is because of Conda and Anaconda and conda-forge. They sort of pushed things forward. I think also data scientists do a lot of exploratory work, and they're willing to move a little bit. In that exploratory work, they're willing to move and put up a little bit more change to get new features and to get the level of performance better. Once things get deployed, that's when things get a bit more sticky and that's where you see a lot of people still using Python 2 in finance and other venues like that.

52:21 Michael Kennedy: Yeah, absolutely. I feel like these, the data science explorations, the models, the underlying technology's changing so quick there, like TensorFlow has come out and Pandas and all these things are just changing so quickly that if you're going to come back to it, you may just want to move to something newer, shinier, better, anyway, and you just. It's much easier to stay on Python, have the later version of Python, whereas like, that website that that guy that used to work here ran that now we just have to keep running, like nobody wants to touch that, right?

52:51 Bryan Van de Ven: Right.

52:51 Michael Kennedy: 'Cause as soon as you touch it, it's your problem to fix it if it ever has a problem, and nobody wants that puppy, right? Yeah, so some of the companies or projects that signed the Python 3 Statement, python3statement.org. TensorFlow, Requests, XGBoost, NumPy, IPython, like that kind of stuff, right? Cython and Spyder, there's actually a ton of projects here. That's great.

53:13 Bryan Van de Ven: A lot of those projects already have. I thought we'd be leading the pack, but we're actually right behind the curve, and that's what made it very easy for us to say, okay, Q4, it's going to be really easy for us to drop Python 2 because all these other projects will have already dropped Python 2.

53:26 Michael Kennedy: Right, right. For example, Tornado's in there and you guys have built on that so, in the sense, they're kind a calling your, not calling your bluff but making sure that you're going to have to follow along anyway if you want to stay on the ways to that, right?

53:35 Bryan Van de Ven: Yeah, and even NumPy.

53:36 Michael Kennedy: Yeah, yeah, yeah.

53:36 Bryan Van de Ven: I mean, obviously, we could pin to a lower version NumPy but we don't want to do that so.

53:40 Michael Kennedy: Yeah, of course, you wouldn't want to do that. Interesting, so we're just about out of time, but you want to talk about Portland real quick?

53:45 Bryan Van de Ven: Sure, yeah. Absolutely.

53:46 Michael Kennedy: We're both from Portland, right? You recently, somewhat recently, not super recently, but you're somewhat new here, and you're trying to get some stuff going in the data science space in Portland as well, right?

53:58 Bryan Van de Ven: Yeah, absolutely, yeah. So I've been here about a year and a half, and it's been a really great experience being here in Portland, really love it. But I am trying to get PyData meetups here started. In fact, we have a first meetup scheduled for, I think August 14th, and we're going to alternate sort of between east side and a west side Downtown location, hopefully, every other month. But me and a colleague of mine are getting that off the ground, and I'm really excited about it. So PyData is this series of meetups/conferences, if the meetups get big enough that it's sort of sponsored by NumFOCUS, and I've been involved with NumFOCUS since the beginning. I think it's a terrific, amazing organization. The people that are there are really great. I think the PyData meetups, in particular, have been really, really great. Both meetups and also the PyData conferences are also really good as well, so really excited to get that started in Portland, and I was almost kind of surprised that it wasn't here already. There's a PyData Seattle meetup and there's PyData meetup. There's like 105 PyData meetups, I think, around the world, so it was, by far, time that Portland gets one, so I'm really excited to be helping get that off the ground.

54:52 Michael Kennedy: Yeah, that's awesome. I mean, that one really is interesting to like 5% of listeners maybe, but it's still really cool that you're doing that here in Portland. Other folks, they can create a PyData in their city's airport acronym, if they want, right?

55:05 Bryan Van de Ven: Absolutely. I think most places don't use the airport acronym, but I'm really fond of PDX, so PyData PDX sounded good.

55:11 Michael Kennedy: Yeah, it's definitely a good one. All right, well, Bokeh's a really cool project, and I'm glad you all have been working on it, and it's great to see all this progress and excitement around it. It's great. Great, and so people should definitely check it out.

55:24 Bryan Van de Ven: Yeah, well, thank you very much for having me. I love to have the opportunity to spread the word and talk about Bokeh, and it's been really great.

55:29 Michael Kennedy: Yeah, absolutely. Let me ask you the final two questions before you get out of here, though. If you're going to write some code, probably Python code, but maybe JavaScript as well I guess, what editor do you use?

55:39 Bryan Van de Ven: I have been won over by VS Code. Yeah, I still use Vi bindings. I grew up using vi and that's still in my fingers, and so I love the vi bindings, but I used to use Sublime Text, but I moved to VS Code and haven't looked back.

55:49 Michael Kennedy: Yeah, that seems a pretty, like a pretty straightforward choice to go from Sublime to VS Code, those are... One has so much more energy and they're super similar in their workflow. And then notable PyPI package, I'll go ahead and throw it out there, Bokeh for ya, and what's else? If there's something that you're like, hey, I ran across this and people might not know about it, but it's really amazing, what would you say?

56:09 Bryan Van de Ven: Yeah, let's see. Well, I'll say PyPI or Conda, right? So don't forget Conda, very important. To remember. But, it's hard to say. I'm so focused on using and working on Bokeh, that's like my day-to-day. Honestly, I have a little bit of television maybe, to put it one way. But I think that a lot of the tools that are built on top of Bokeh are really interesting to me, and so I like looking at what is happening with them and seeing what developments are going. Obviously, I think all the tools in the PyData ecosystem are amazing. I think Numba in particular is really interesting. So Numba is a compiler for Python, lets you really accelerate certain kinds of code, and it was originally created again by Travis Oliphant, but it's been moved on since then. It's actually grown really successful in certain kinds of venues, so I think Numba's a pretty interesting use case, and I certainly, of course, think Dask is really fantastic as well.

56:54 Michael Kennedy: Yeah. Those are, I think, good ones. All right, Bryan, final call of action, people want to get started with Bokeh, what do they do?

57:01 Bryan Van de Ven: Yeah, absolutely. Love to get people involved. So if you want to talk about development or have questions about support, we have this Discourse, discourse.bokeh.org. If you want to just get started from a very high level, visit bokeh.org. It's a great one stop to get to a lot of other resources like documentation, like the gallery, like the GitHub page, and just to see what's going on. But in terms of talking to us, the Discourse is a great spot, to make a post or a topic there, but of course, GitHub is a great place if you have ideas or suggestions or want to report problems, of course. GitHub is a great place to contact us.

57:31 Michael Kennedy: Yeah, for sure, and PRs are accepted.

57:33 Bryan Van de Ven: PRs are always accepted, yes.

57:35 Michael Kennedy: Yeah, very cool. At least consider it, right? It's considered.

57:37 Bryan Van de Ven: Consider, yeah, at least consider it, for sure.

57:39 Michael Kennedy: Cool, all right, well, thanks so much for being on the show. It's good to talk with you.

57:41 Bryan Van de Ven: Absolutely, thank you so much, Michael.

57:43 Michael Kennedy: Yeah, bye.

57:43 Bryan Van de Ven: Bye.

57:44 Michael Kennedy: This has been another episode of Talk Python To Me. Our guest on this episode was Bryan Van de Ven, and it's been brought to you by Ting and Linode. Ting is the fast mobile network, custom-built for technical folks. Use their savings calculator to see exactly what you'd pay. Visit python.ting.com to get a $25 credit and get started without a contract. Linode is your go-to hosting for whenever you're building with Python. Get four months free at talkpython.fm/linode. That's L-I-N-O-D-E. Want to level up your Python? If you're just getting started, try my Python Jumpstart by Building 10 Apps course, or if you're looking for something more advanced, check out our new Async course that digs into all the different types of async programming you can do in Python. And of course, if you're interested in more than one of these, be sure to check out our Everything Bundle. It's like a subscription that never expires. Be sure to subscribe to this show. Open your favorite podcaster and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now, get out there and write some Python code.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon