#34: Continuum: Scientific Python and The Business of Open Source Transcript
00:00 What if you built a product that dramatically improved how hundreds of free, open source Python libraries worked together, gave it to the world for free, and the built a thriving business on it? It's the open-source dream really, isn't it? This week we talk with Travis Oliphant from Continuum who did exactly that!
00:00 This is Talk Python To Me, show number 34, recorded October 26th 2015.
00:00 [music intro]
00:00 Welcome to Talk Python to Me. A weekly podcast on Python- the language, the libraries, the ecosystem, and the personalities.
00:00 This is your host, Michael Kennedy, Follow me on twitter where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm and follow the show on twitter via @talkpython.
00:00 This episode is brought to you by Hired and Digital Ocean. Thank them for supporting the show on twitter via @hired_hq and @digitalocean. That's right! Digital Ocean has joined Talk Python To Me as a sponsor. Thank you Digital Ocean, you rock!
01:26 Hi everyone. Thanks for listening today.. Let me introduce Travis. Travis Oliphant has a Ph.D. from the Mayo Clinic and B.S. and M.S. degrees in Mathematics and Electrical Engineering from Brigham Young University. Since 1997, he has worked extensively with Python for numerical and scientific programming, most notably as the primary developer of the NumPy package, and as a founding contributor of the SciPy package. He is also the author of the Definitive Guide to NumPy.
01:57 As CEO of Continuum Analytics, Travis engages customers in all industries, develops business strategy, and helps guide technical direction of the company. He actively contributes to software development and engages with the wider open source community in the Python ecosystem. He has served as a director of the Python Software Foundation and as a director of NumFOCUS.
02:17 Travis, welcome to the show.
02:20 Hey thanks. I appreciate it Michael.
02:21 Yeah, I'm really glad to have you here on the show, I know you guys are doing some amazing stuff with Python, and really leading the way with the whole scientific computing angle of it. So I am really excited to talk to you about that today. But, before we get into the details though, what's your story, how did you get started in programming?
02:39 As a young child, I was in fourth grade I believe, I remember my first Basic class, it was an after school program. I just got pretty excited about that. My dad was a programmer as well, and so I kind of liked to do what he was doing, but the computers back then were very accessible to a hobbyous mind and the expectations were low too, we didn't have interactive video games to try to reach, it was interactive text game was actually interesting. And so I just got started there and had a time, this hobbyious computers I was kind of doing at the very beginning and I enjoyed that as well I enjoyed a lot of math. So my first- I wrote Pascal in high school as part of an AP computer science class, and then when in college where I was studying engineering in physics and I learned C, and I started using Matlab and bunch of other stuff out there.
03:36 Ok, interesting. So, you kind of built up into the more serious languages like C and so on, and then, I am guessing you are doing Matlab because you are trying to do some kind of scientific visualization stuff, right?
03:49 Yeah. It's a high level array programming pure engineering controls image processing, it's much easier to think at a high level than have to worry about the details of programming. I could program in C at that point but it was much nicer to think at the high level and not to worry of those details. I was thinking about the math problem I was solving.
04:13 Yeah, that's for sure, no matter how good you are at C, you are that efficient in array, just there is really way down there, right. And so, Matlab is really great for building these simulations and answering these questions. but one of the areas where those types of systems fall down is like building actual upstart run right, that you can run in production, and so is that kind of what led you towards the Python story?
04:38 Yeah, a little before that, because I was still a grad student when I found Python, it was more, it was 2 things, one is I, in the back of my mind I was bothered that I was sort of writing in Matlab and anybody who wanted to use my code had to also have this, buy this expensive package before they could benefit from what I have done. That always sort of bothered me. And then, what was the proximal reason I had to switch to Python was I ran out of memory. I was doing very large 3 dimensional simulations, actually 5 dimensional data, but 3 dimensions are space and time and the third axe is polarization. And I needed the memory so if I switched to float 32, for the kinds of problems that I was solving, and I didn't have those data types in Matlab so I was looking around for another way to do this, and that is when I found Python.
05:32 And the numeric package and I kind of got hooked. I had done some Perl scripting in the past because like Fortran libraries and I understood that use case but also having a numeric package available let me do high level simulations quickly and then just this 5:50 of other things got me kind of started. And I just loved the language and the syntax and it was close enough, I didn't have to think about computer science, I could think about my problem, at high level. So that is what hooked me. And it was readable later, that's what I really appreciated about it. So that was in grad school, 15, 16, 17 years ago now, that I got started with Python.
06:14 Ok, that's cool. Yeah you and I were in grad school not too far apart actually in time, maybe in distance. So scientific Python looks really different 15 years ago than it does now. i would say scientific Python was really different 2 years ago than it does today, but you know, that far back is really interesting. So maybe you could tell us what it was like then and how you've seen it evolve over time?
06:39 Yeah, so back then a lot of scientific Python was stealing steering other code, it was a lot about you write a script that kind of call other either machine codes, other codes written in C or Fortran, and it does a steering, and you do a lot of data managing maybe file reading, reading input data and writing it out, kind of using the string processes and facilities, and then because Python could be extended then it was not too difficult, you know, the Swig package was available at the time, it was really instrumental early on, and kind of made it easy to wrap large code bases and make them accessible to Python.
07:14 And so people were able to then kind of have objects in Python, that mirrored their low level objects, and so they could start steering their program from the high level. And that kind of as numeric, as array processing library emerged and then NymPy came a little later after Num array, it was kind of that whole process led to more people actually at the high level just building an array or even construct system Python itself from the very beginning instead of having an object layer on top of low level machine codes.
07:47 Ok, so in the beginning, it was just about sort of an easier API to what people are already doing, in other places, but then-
07:56 And help you to test those two, it kind of became like it was really easy to test your C libraries and your Fortran libraries.
08:03 Right, absolutely. But, then it sort of the more it cut on it was more like well, we'll just stay here, right.
08:09 Yes, right. I could just use this, the spelling here is nicer and easier and I will never have to drop to the C level. And there were some really important things that happened that conveyed that, some processes are still happening, I mean there are still people that write Fortran, still people that write C, and the fact that Python has this great glue continues and continues to make it a strong use case.
08:30 You were one of the people that was involved with NumPy and SciPy and all that actually getting it off the ground, right?
08:37 Yeah, I was the grad student, it was really- I can say I fell in love with Python as a grad student, and it was a weird reaction to have for a language, but what I loved is just the ability to do things quickly. You know, I could sort of iterate quickly, I could think at a high level, the concept of Python fits your brain and gets out of your way as a common mean, and I just totally felt that, I could think about my problem. And then- I got excited about it, and I looked around, so well but I am missing some libraries here, I mean the integration, the optimization, I need statistics, I need these linear algebra libraries, and a few of them were numeric, and there were a few scattered around but I kind of got- around 1998 and 1999 sort of just I learned how to write an extension module, you know, Guido and something called TableO from Mike Miller showed me how to write an extension module and I just got hooked and started writing a bunch of them.
09:32 That's great.
09:33 So, around the whole bunch of old Fortran code it's going to make them more accessible, and that's kind of the beginning of the SciPy project.
09:41 Yeah, very cool, take that C knowledge you had and put it to your serving.
09:45 That's right. And this is back when you had to hand wrap and track reference counting. The fact that what you see was really critical to making sure that would happen, it's certainly a lot easier now, you can do a Cython, it's really easy now actually. Incomparably.
10:00 Yeah. These kids they don't know how good they have it.
10:05 Right.
10:07 So, somewhere along the way, you started a company called Continuum, yeah?
10:12 Yeah, that was pretty recent actually in the whole history of things, since I've been doing this for 16 years, you know, I was a professor for a while and that is when I wrote NumPy, kind of growing from the work of the space science institute, the 10:27 work and then the previous work at numeric community. NumPy brought together in 2006 and 2007, and that's been very successful, it's really thrilling to see how many projects have been building around that and then also it's been thrilling to see the community going around and keep supporting NumPy.
10:42 So a lot of great people contribute to that project today. And same is true with SciPy, but I love my academic crew behind around 2007, I have 6 kids, I need to support them so I needed to figure out the way to make my way in the world, and I started to consulting around Python, for Enthought in 2007, and then in 2011 Peter and I were kind of talking about some gaps and really trying to, you know, realizing that in the Python we need to make a bigger story and big data world. And have a stronger play in web visualization and big data world. And so we started the company in 2012 to really make that happen.
11:23 Yeah, and your company is called Continuum, and you guys have a really ambitious goal, I love reading your website because it's kind of like- there is all this great scientific projects and thinkers and stuff out there but when they go to solve their problem they end up stuck at the command line saying cannot compile, some random C thing because they are trying to install a Python module that's got C extensions and they are just, you know like, they are stuck, right, and so if you could build an environment where everything was sort of import anti gravity, right, is like all super easy but even for the science stuff that's hard right, that's kind of your goal?
12:07 Right, that's definitely a part of our goal, I would say that at Continuum we've had no shortage of lofty goals, Peter and I are both ambitious thinkers, and are kind of full of ideas and thoughts , and I've had things we were building up over years. So when we started Continuum, we had really ambitious goals, not only around package management, really that kind of came as a 12:32 of our other goals, our goals are to make it really easy for data analyses, for scientists, people to change the world this way you talk about them today. People to change the world in either way to do it easily.
12:43 And us be loaded down by dev ops and issues and they need to be able to create visualizations to show up in the web and they need to be able to take advantage of modern hardware easily. So do you learn Parallel, you need to be able to take advantage of GPUs, and you need to be able to translate their great ideas into code that does that with the nice web front end. And then, before we had really pulled that off, we have actually got- the first problem is how do we shift this, this great stack of tools to people. And so Anaconda kind of came as this really immediate 13:17 from our overall goals we knew this, we really worked on the packaging problem.
13:22 Right, so you are thinking data science, visualization, scientific computing, ok, Python now.
13:28 Right, data science is scaled in a web browser visualization how do I get that installed to everybody? And we've got to work on the distribution problem.
13:38 Right. So Vanilla Python is a fine base for that, but if you've tried to install Matplotlib on the Windows, it is no treat. Like what is that...
14:00 What's cool about Python is how easy it is to integrate with so much other stuff. It's a glue that really- it's why it was adopted in the scientific domains and why continues to be powerful but the other end of the stick of that glue is you've got a lot of other systems you are interacting with. You know, installing something, SciPy for example is notoriously difficult to install because it's a Fortran code. So you are compiling Fortran code and then you are integrating it. And you know, I was 14:30 any decent science use Fortran, well today people are like I don't want to have Fortran compiler memory on my system, so...
14:38 Yeah, absolutely.
14:39 It becomes a challenge. You know, the visualization compounds that with the additional C libraries and additional configuration issues that are slightly different, different platforms, and it's how you bring all that together, it's a hard problem and so we've taken it in strides and part of those strides is creating Conda package manager, you know really having a cross platform package management solution that goes beyond just Python, lets you install Node and R and anything else, Java, Ruby, Scala, you can install all of it with Conda. And then on top of that, being the definitive source for freely available binaries mobile platforms for these primary fundamental packages that make up the pydata stack.
15:26 Ok, so that's interesting. Both of them. But, the fact that you say, all right, well, you know, how different is everybody's Mac? We could probably compile that once and then nobody has a problem of getting this obscure compiler to work on their environment, right?
15:41 Right, that's an important question actually, because what is the platform, we have to look at this question really hard. I mean a platform fundamentally is actually the tree below you, it's all the software dependencies that you are not caring about, that you depend on. That's actually a platform. Right?
16:00 That's right.
16:00 Now, we talk about OS platform, you can kind of cut that tree at the particular plane and usually go well, who cares, want to use the OS for the live C and this 16:07 libraries, everything else above that is managed from the package manager. And for us, the way we manage that is- at Windows, they do a lot of work to manage that, and basically if you compile for a particular Windows box it is going to be pretty well to look on every release of the software above that. So usually it is about picking the baseline, so the Windows, we used Windows XP, I think we just moved to Windows 7, it's king of our- and that is where we compile all the software so that it works on Windows cross net, it was compiler that pick, well, on Python 3.5 it just moved to Visual Studio 2.15, in Python 2.7 it was Visual Studio 2008, and so we managed that. And then macOS 10, ok it's version 10.7 of the OS, that where we, and this version of X code, that's our base. And that actually becomes a little bit harder but there we just- it's the compiler and the libC, and the amount of distribution it works on your- libC.
17:18 Ok, that's-
17:20 That's how we manage this, but it is a lot of questions.
17:23 Yeah, and it's hard to generalize that, but I think that's a really hugely valuable service because just I think it was this morning, I was trying to install Micro WSGI on Ubuntu 1510 and for some reason it would not compile correctly, trying to install the main environment but it would go in virtual environment, right. And just these little headaches you keep bumping into and when you are experienced you are like, "ok fine, we'll just do it this way or that way," but when you are new or programming is not your full time thing, you are a scientist and you just- it's a show stopper, right?
17:59 Exactly, right. That's actually something you know, lot of people have, Conda has emerged at the same time that the Python packaging has been improving, pip and virtual M have gotten a lot better over the past several years then they were when we started, and our story in Conda has some overlap with that, but it really has a different focus, if you use pip and virtual M you say, "I'm an integrator, I'm going to make this all work for my system" and that's great, that some people want to do, but not everybody wants to do that, or really should be trying to do that.
18:30 Conda is about, and Anaconda is about here is an easy use platform for Open Source analytics that- you can just get started without, you don't have to be worrying about configurations before you get started. Just get started. Right? And then you can decide later how do you want to potentially build your perfect system and whether you continue to use Anaconda or binaries, or you decide you are going to use your own binaries, that's a decision you can make down the road or maybe use one, get the stack you want, recompile everything the way you want it. All that is still available to you. We are just trying to get you started quickly and get you using this great stack of tools.
19:12 Yeah, that makes a lot of sense. Because once you've got a working system, then you are willing to put in that 4 hours to get that thing set up; but if you are trying to decide if it is even suitable for experimenting with, you don't want to spend that 4 hours, right?
19:25 Right, exactly. And that was the role from the very beginning- I personally think we've helped with the recent explosion of Python in the Pydata world just how many more people, I mean Python is the fastest growing data analytics language. Its way of adoption has overtaken R, Python and R for a long time Python was overshadowed by R like nobody really talked about Python an of an Open Source language, it was always R this R that, and R is great, it's got some great things about it, but Python is used in a lot of the same context and it's only been fairly recently that people have looked around on the commercial side, you know companies have said, "wait a minute", database standards for example, and said, "we've got to have Python support too, not just R support." That's been really great to see.
20:12 Yeah, that is really great to see, I think you are right about you guys helping with the explosion there, I mean, to me, it's NumPy SciPy and then it's IPython, and maybe a few other things.
20:26 This has made a really big difference over the past several years, a lot of people using Panda is now--20:30
20:30 Yeah, that is right, yeah.
20:32 For data frame kind of operations.
20:34 Yeah and a lot of those are not easy to set up and install and get going with. And so having this distribution is really great. So maybe we could- you talked about a couple of things, you talked about Conda, and you've talked about Anaconda.
20:34 [music]
20:34 This episode is brought to you by Hired. Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.
20:34 Each offer you receive has salary and equity presented right up front and you can view the offers to accept or reject them before you even talk to the company. Typically, candidates receive 5 or more offers in just the first week and there are no obligations, ever.
20:34 Sounds pretty awesome, doesn't it? Well did I mention the signing bonus? Everyone who accepts a job from Hired gets a $2,000 signing bonus. And, as Talk Python listeners, it get's way sweeter! Use the link hired.com/talkpythontome and Hired will double the signing bonus to $4,000!
20:34 Opportunity is knocking, visit hired.com/talkpythontome and answer the call.
20:34 [music]
21:55 Let's maybe focus on Anaconda for a minute like, so you've got sort of 4 interesting I don't know what do you call them maybe pillars or concepts behind this distribution, and I thought they were pretty cool, so maybe we could talk about them a bit, so one is you are committed to Open Source now and forever, what's the story there?
22:18 Yeah, so one of the things we want to make sure people realize is our roots are in Open Source. I spent years building SciPy and years building NumPy, and when we started the company we were on Bokeh and Conda, and they are all Open Source. I believe infrastructure should be Open Source, and one of the goals I had when I started Continuum was to create an organization that would allow a sustainable production of Open Source by also selling things for enterprises, revenue produced by selling solutions to enterprises can actually support growing more Open Source software. I really believe it helps everybody, and help places you are not even able to target and help. So we are real believers in Open Source, we believe that everything is important, we also believe in building on top of it and having thing we sell on top of it, but we want to give back to the community in every opportunity we have and can. So, Anaconda as a distribution of the Open Source ecosystem is easy Open Source and will stay Open Source.
23:23 It's really interesting to me how successful people have been with Open Source on the large scale and on the small scale lately, and I think this is a new trend, I mean you probably know better than I do, but I'm thinking of companies like MongoDB that are over a billion dollars evaluation and you can go to GitHub and get their thing they are selling for free. It's so paradoxical, but it's really interesting.
23:54 Well some of that still has to be played out I think in the market place, some of these large scale evaluations are a bit mark to vision mark to dream...
24:01 Yeah yeah, we'll see.
24:04 They are not quite proven out whether or not they can be, is there going to be a sustainable repeatable sell that is going to drive that, they are known, but I think the reality is there are known sales conversations that can drive Open Source, and those are identified and I think there are more to come, and people are recognizing in order to play in the ecosystem of tomorrow you are going to have to be Open Source because the developers demand it, the people who are- you have huge advantages in terms of what people know already, so if you don't, you are going to have to train people, a lot more than you do if you kind of go with what Open Source is driving. There are a lot of factors that drive it, but we are basically a group of folks that we drank the Kool-Aid long ago, so we are just continuing that, it's sort of part of our core, it's not something we just sort of, "hey, let's try this, jump on this bandwagon," it's who we are.
25:02 Yeah, you guys have been there from the beginning, that's awesome. But yeah, I definitely think that not only is it becoming- there is a lot of examples of businesses doing it successfully but like you said, it's been accepted by purchasers, people who are going to say, "I'm going to buy a service, of course I'm going to go get this Open Source project and I will get the commercial support or whatever" whereas it used to be kind of this edge thing, "well, maybe I can save some money," but you might pay for it later when it doesn't work quite well- that discussion has sort of sailed, right?
25:35 Yeah, I think there are still some questions around. I personally believe there are a lot of people benefiting greatly from the Python that need to figure out how they are going to help support sustain it. Either there is going to be you know, you buy commercial distribution, support focus, there has got to be some data log around how are we going to make this work long term. So I would definitely love to encourage people to take a hard look and that if they are relying on this for the day to day- think about how this is going to maintain.
26:07 Right, don't shoot yourself in the foot, right?
26:10 Right. Don't let the today's price to be the enemy of tomorrow's success.
26:15 Yeah, for sure. So that was sort of pillar 1. Number 2 was tested and certified packages to cover your back.
26:22 Yeah, that's right. So that kind of speaks that we do a lot of work to make sure the Python works together, and so you want to have a process where we have a download and we have a repository. We make a release at Anaconda, we basically are saying this group of software works together and we've done some integration testing. We make a release of the package in the Open Source repository and basically we have done the testing on this package that works. And a little bit of integration testing, but we haven't done like full integration testing.
26:51 So you kind of dependent on where you want to sit, do you want sit and take only Anaconda released versions, because you want to make sure it all work together or maybe you are willing to deal with the occasional you know, "hey this new release of software X doesn't quite work with this other piece." It's easy to fix, but you might have to get the configuration work just right. But either way, we've done testing in various levels and we basically promised that it will work. As a commercial thing, we do offer identification as well, so that is one of the features we provide to people the device description, and we actually identify against the concerns of the Open Source, using and deploying Open Source in your organization. We'll make sure that you are protected from copyright infringements, and pattern infringements concerns that the company may have.
27:40 That's a big deal. That's really cool.
27:42 Yeah.
27:42 So with the sort of consistent packages, how many different packages are in the Anaconda distribution?
27:49 So the Anaconda download we've actually been tearing that down trying to make the download manageable, you know manageable to the 100K or 700MB and so that it could grow like be 3 or 4 GB but we've been tearing it down, I think it has about 70- 75 packages that are downloadable, and then in the repository that is quick Conda install away, is 330 currently and growing. And of course you have access to everything else, through pip or you have Anaconda cloud which provides additional community, anybody can make a Conda of package and put it in the Anaconda cloud, and then they can go and you can install it from there too.
28:29 Ok, excellent. Yeah, one of the challenges with sort of this mix and match OpenSource feel is there is nobody's job or responsibility to make sure that all of these pieces fit together right, there is a whole host of developers, more than 70 working on those 70 packages. How much coordination are they doing, right, I mean-
28:53 None. A little bit.
28:55 Yeah, I'm sure it's a little bit.
28:58 They each use some corner of the stack right, and so for the things they care about they make sure that works together.
29:06 But it can definitely fall apart, like I've periodically decided, Oh I want the new stuff, so I went to- I just did a pip, upgrade everything in my installation-- that usually is ok, except for you know, a few where it's not anymore, right, so, it's-
29:24 Yeah, if you were in the deal with that, it's kind of fun sometimes to be on the cutting edge, it works ok. And it sort of depends on which cut you are making through that stack, but especially when it comes down to "hey, I'm tied to this vendor specific library, I've got this visualization stack that requires this version of that GPU driver", I mean, it's really hard to get that working together without somebody doing the testing.
29:57 Yeah, I can imagine. So you guys have got that covered, that's cool. So the third one is explore and visualize complex data easily.
30:05 Yeah, so here we are basically advertising the capability of this pydata stack. You are helping people understand that what you can do with it. And the fact that you have such a great visualization tools- we spend a lot of time on Bokeh but we also use Matplotlib and we use 30:21 and Mayavi and the other tools that let you basically bring it to life quickly. And inside of Jupyter or on the command line or inside of Spider depending on exactly what you want to use, all of it is available and easy and kind of on your finger tips.
30:38 Yeah, that's excellent. I installed Anaconda and go into the environment IPython notebook- boom, everything works, it's ready to roll, it's nothing to configure, the same thing for Spider, you can pull off the ID which like a scientific computing ID type thing and- it's all right there, right, so that's cool.
30:59 Right, so that's basically, you know, it's plot works and the Matplotlib and if you Bokeh provides charting APIs you can do complex hystograms,31:13 works out of the gate, Panda's has it's histograms and plot tools, you can just bring up a data frame than type .plot on it, and bring up this interesting plot, just lots of things available. You just basically have to do a Google search on the kind of analyses you want to do and it's all ready to go. Almost all of it will work.
31:28 Yeah. That is really cool, I definitely if somebody out there is learning data science- definitely start with Anaconda. Because you can focus on the actual thing in learning and not fighting with compilers.
31:42 Yeah, exactly. Not becoming distribution, not becoming a self integrator. The official terms that 31:50 uses for people who are using pip.
31:51 That's great. So you talk about Bokeh, I don't really know much about it, what is it?
31:57 Yeah, great, so Bokeh is a, I say Bokeh, people can say Bokeh, Bokeh is a plotting library for people that don't want to have to learn Javascript, and don't want visualizations on the web. It's kind of like D3 for the rest of us, kind of for the data scientist who knows Python or R or another high level language, but you can write complex visualizations, high level plotting, charting, histograms, cross plats, but haven't show up on the web.
32:29 So that's Bokeh. Bokeh has a Javascript side, it is Javascript actually copyscript is the current implementation detail but it's a Javascript library that's in your browser, and you communicate with it via protocol, so Json objects we communicate back and forth between your server, either as a onetime static publish embedded in my html as this Javascript that talks to the Bokeh.js and produces this visualization, so we have a static interactive plot where you can zoom into the data, you can drag and drop, you can slide, you do selections of points and have the visualization update-
33:11 So the data is static and it comes down all at once, but then you can explore however you want, sort of visually.
33:16 So that's one use case, right, where you want the data static and you want it all in your browser at one time and you are exploring it all at the same time. But the other use case, it's important, where you keep too much data, you don't want it all in your browser, you have to interact with the large data set and you communicate via maybe a web socket API or kind of an interactive API the browser is talking to a client and you are interacting sending the information back to the server and there is a Bokeh server that's then bringing the information and changing what is actually shown up in the browser directly.
33:51 So, this bi directional communication happens between the server and the client. And that also works, and it's pretty straightforward you don't have to know all those details, you just kind of set up your visualization and a very simple API and then that kind of comes for free, that interaction kind of comes for free. So there is a lot of possibilities there, and so there is an API, there is a lot of great APIs, there is a plotting API, kind of relatively low level, it's not really that low level, it's more like the plot level with you know, I'm going to change this glyph, I'm going to change that glyph, put these axes here and there is a charting API that might take a data frame, it might take a high level object that you can then quickly build histogram or a nice chart.
34:38 The other thing Bokeh provides is novel graphics like you can just draw with, I'm going to draw rectangles, I'm going to draw circles, and because we come from the scientific Python background it's vectorized interface so with the single segment command I can draw a hundred lines and I give all of those n points in one big vector. Does that make sense? I can kind of give a single command to generate a bunch of circles.
35:07 Yeah, sure. I was looking at some of the graphics on Bokeh.pydata.org and
35:18 Yeah exactly, a good place to go-
35:18 And there is amazing stuff over there, yeah it's just like contour plan, all the different kinds of plots you can think of, just, that's really nice.
35:27 Exactly. There are some examples of the 35:29 and kind of novel graphics is one of the key pieces. You know so the key thing are novel graphics, it's a library, it's a fast library, kind of if you have visualization you want to do and you have data you know how to access in Python or R there is an R Bokeh interface, check R Bokeh because it may help you get your visualization done quickly and it can produce a static HTML or it can produce an application basically, a visualization application.
35:57 Yeah, that's really cool. Because it runs on Javascript you sort of automatically get like distributed computing right, you off load a lot of the computation to the viewers right?
36:09 Yeah. Exactly.
36:09 [music]
36:09 This episode is brought to you by Digital Ocean.
36:09 DigitalOcean offers simple cloud infrastructure, built for developers.
36:09 Over half a million developers deploy to DigitalOcean because it's easy to get started, flexible for scale, and just plain awesome.
36:09 In fact, Digital Ocean provides key infrastructure for delivering Talk Python episodes every day. When you (or your podcast client) download an episode, it comes straight out of a custom Flask app on Digital Ocean and it's been bulletproof.
36:09 On release days, the measured bandwidth on the single $10 a month server jumps to 900 Mbit/sec for sustained periods with no trouble. That's because they provide great servers on great hardware at a great price.
36:09 Head on over to digitalocean.com today and use the promo code TALKPYTHON to get started with a $10 credit.
36:09 [music]
37:24 Yeah, if you are trying to generate jpegs and send them down, it's all in your server.
37:30 Yeah, exactly. It's kind of the modern direction modern web applications. It's always impossible to publish images and static plots to the web, but sort of just traditional standard HTML serving you embed Javascript that has a bit of an activity and next step of course is to have that bi-directional communication with the server, they use web sockets to do that and Bokeh makes it easy to take advantage of all of that.
37:55 Yeah, and it's already beautiful so you don't have to be a designer, very nice.
37:59 Right, you don't have to be a designer, you don't have to learn Javascript, you kind of make it accessible to the data scientists or the scientists to use. The issue with data scientists, they may learn Python because it's accessible, not necessarily become an expert Python programmer but they can learn enough Python to do their workflow and to accomplish their goals. And, they don't want to learn a ton of languages in order to get all the way there and quite often getting farther along means building a graph, building a visualization, and so our mission is to support that group of people and make it really possible for them to translate their ideas and their thinking to real world interactive visualizations they can communicate to someone else about, and have that just happen seamlessly and transparently, and we do that part of it is Open Source library that form a core part of its integration pieces that we maybe sell at the high level, and some of us are just doing services for somebody 38:57 to build that solution for them and ship it on premise.
39:02 Yeah, really nice. Did you guys create Bokeh?
39:05 Yes, yes. Bokeh is the project we started.
39:09 So did you look around at other things that were out there and say, "you know what, all these are nice but they just don't quite fit the story on getting on the web"
39:17 We did. Yeah, exactly, we spent a lot of time, like some people had done a really good job of putting Python interfaces to D3 and certainly Bokeh, you can also integrate D3 and Bokeh together, they are not sort of, it's in the web, you can build kind of mash ups, but we wanted to use the canvas, we wanted to make sure we can scale up to millions of points, and we can actually get billions of points in the browser but you can deal with billion of points and then push the ones in the browser you need.
39:50 We have a technology called data shading, it's a really exciting technology associated with the Bokeh project that is up and coming, the only way to visualize billions of points easily and we wanted the control over that, so we knew kind of where we headed, we knew we had to do that, and it's been really good. There [40:08 inaudible] interface on top and then convert it and answer all the questions that's given us the control. One aspect of other Javascript libraries is they won't really build with other languages in mind. There is a lot of libraries that say basically yes, you have to be a Javascript developer and then you can use it. Actually Bokeh is a Javascript library that is easier to use from Python and R than it is Javascript.
40:35 Yeah, yeah that makes a lot of sense, I mean, a lot of those great libraries they are like, ok, you can do this in the client side and then you connect to your own customer service is that you dream of and you write the server side you do all this stuff and again, if you are a web developer, fine. If you are a data scientist and you try to solve the problem it's easier to forget as developers that not everybody in the world is a developer they have their own special skills right, and so can then focus on their skills. That's cool.
41:09 And the cool thing about Python honestly is it bridges those worlds. And it's because of, I think it's because of it was the teaching language to start with, and so it is meant to fit in your brain, it cannot take too much space there, and so leverage your English language center is another thing you do and the white space consistency is part of that, and so because of that lots of capability, lots of scientists lots of data scientists come to Python and can work there together, and they start to want things like a developer would want but they want it a little differently, they don't want to become developers, they just want the capability.
41:45 That's right. It makes perfect sense, yeah, it makes perfect sense. So, what does a typical Anaconda user, do, like who is your typical user and like what are some notable things people have created and things like that?
42:01 Oh yeah, I mean it's a great question, they span the spectrum, something I love about the work we do and what I have been doing for the past 15 years honestly is just the breath of smart people that we interact with. So it might be a geophysicist trying to understand where to find oil, it might be someone managing a reservoir, maybe somebody on Wall street maybe they are managing portfolio of the client or maybe they are trying out to figure out a new derivative, the way to manage risk and trade risk. Or they are trying to figure out how to avoid the 2008 crash. Help with the risk exposure to people.
42:34 It could be, we have people who actually are really great users, have a story around, they are finding rare diseases they have a way to use mycoscopy, to kind of single gene changes that cause certain rare diseases, so the diseases that are not well funded, nobody looks at, they are hard expensive to do anything with, but they have a technique where they can study it and understand because it could be reproduced the single gene change, and so they have Anaconda and Bokeh and visualizations, and Jupyter notebooks to look at workflows where they image a bunch of cells with different knockouts and try to find drug treatment. You know find the pharmaceutical company that have lots of drugs that they are looking for use case and they have a platform for testing these use cases against single knockout models of genetic diseases. So, people who help in search of these cures for rare diseases, they also use these tools, it's awesome to see.
43:43 Yeah, that's really great, to see people actually making lives better with software that you create or at least help support, right?
43:51 Exactly, that really makes my day, it really helps motivate me and helps motivate the company so we try to do is really make a mantra, we make the world better by helping people to change the world, and do their job.
44:04 Yeah, very cool. What is the most surprising sort of use you've seen people try to do something with?
44:13 That's interesting. I think some of the embedding, you know, people embedding it in really small places, I think that is what- I didn't expect people like sort of get Anaconda running on a tiny phone or tab- you know, they are getting bigger these days, but like, I'm less surprised now but I think early on I was kind of surprised.
44:36 Yeah, what people do with like mobile stuff these days is crazy.
44:39 Yeah, like the Raspberry Pi is an example, we had one 44:48 April's Fools few years ago, did a Python 1.0 kind of package to kind of install Python 1.0.
44:58 How nice. Funny.
44:59 This is a proof of concept, you can manage all kinds of 45:03 in the Python. There is a lot of great use cases, but…
45:09 Yeah, it's hard to think of the most surprising, right, they are all surprising I'm sure. We talked a little bit about running a business on Open Source and I went and I got Anaconda and I downloaded it, you didn't even make me give you my email address, thank you for that, but you have Anaconda free, and then you have some other things that are kind of products. So you've got like a pro, a workgroup and an enterprise version and you also talked about the Conda cloud, is that right?
45:39 Yeah, Anaconda cloud, yeah.
45:41 Ok, and what are all these?
45:44 Yeah, So maybe I can address the first question kind of generically, we do have to sell things to be a business, right? Businesses at the end of the day they sell stuff. And what we sell is we sell services, we sell training, we sell software. And what we described as our software offerings and we have subscriptions to Anaconda and their targets are really the enterprise usage. Somebody, we want hobbysts and academics and even people in enterprises to use this but if you become dependent on it, you are to think about looking at subscriptions so that we can support you in your use cases.
46:18 And make sure that your new versions don't go down, you can be well supported. So the first chance is basically just the support of Anaconda and that gives you the identification and then the priority supports so you can call us kind of get what you need fixed on your timescale instead of ours. So that is kind of first thing, kind of the I personally believe that that is not enough, I think people from business perspective I think just the way we are as people we kind of we need kind of additional stuff in order to get others to open the check book and send the money, we need to get more staff.
47:00 So we added additional things into the work group and enterprise descriptions and those are in the direction of repository management, one thing work group provides you is the ability to manage exactly what Anaconda users internally are getting, if you install Anaconda you can point to our repositories and get kind of this open source set of repositories but a lot of companies want more control over that. And so Anaconda work group gives you a chance to have your own private repository you can control what goes there, you can also, you can build packages and upload them in there and manage deployment of Python and applications built around Python throughout your organization. So it's kind of a repository server.
47:41 Yeah, there is certain places where that matters, a lot. And either it matters a lot or it doesn't matter at all, I think. I did some work with the guys at NASA and I was showing them all sorts of- so I was, "ok here is what we do and you've got to install this and this and this and you just"- and they are like "wow, wait a minute, we can't just download that stuff and install it here, like there's rules and restrictions, and it's got to be approved"
48:13 Yeah exactly so our kind of work group has a single point, we know it's approved here, and then we can control it, it kind of helps the IT organization, and it's growing things like- it manages work, it manages environments and packages and notebooks. You can actually see what's deployed for somebody within Anaconda work group by going to Anaconda cloud. So Anaconda cloud is at anaconda.org, then Anaconda cloud stores what gets installed behind your firewall if you get Anaconda work group. Anaconda cloud is really about people who- it just gets everybody the ability to have "I want to publish my Anaconda packages and have other people see them", and it gives you an easy way to do that.
48:52 So you can say, "hey, here is my environment, it's got these package dependencies, and I want to be able to point somebody to it so they can get the exactly- not just they have to rebuild it, and then hopefully get the same environment I had, but here is the 49:02 packages that actually work for me", you can put those in individually, and also we have an environment specification, an environment concept, and you can publish that and somebody can just point there to Anaconda cloud and then they can get exactly the reproducible result they are looking for.
49:24 It can be a big problem in science, when I was talking to the guys at the LHC it was a really big deal.
49:31 Yes.
49:32 The reproducibility, making sure they have exactly the same verse of everything and if you are working in science, reproducibility is kind of the key.
49:40 It is key, exactly, science but also business have the same problem, they want to- a lot of time is spent on just, "oh I have this bug", "well ok in what version of what software and what is your data environment" like just getting that reproduced so you can actually figure out what is going on, people spend all kinds of money and time doing that and this, basically, we have a part- our technology helps in that solution, there are really streamline that story I would say it's not as well appreciated I think as other- we have not had the resources to market quite as big as other people have, we are trying to change that but people who use it and really "oh, Conda environments this is really nice, I can do this really quickly" because the rest of IT world is pushing other stories and some of those stories aren't good, they just sort of overkill sometimes and our simple kind environment will give you that reproducible environment. Very easily and very quickly.
50:32 Right like one of the alternatives may be hey go use Docker and get the series of images but maybe-
50:41 That’s great, Docker has- it's great, but I think it's overkill for some of these use cases. And this is the light way.
50:47 Again, not everybody is a devops engineer.
50:50 Correct. Also, exactly. So we are working closer to Jupyter community and others to try to- I think there is really it's getting to the point where people can quickly do this and we are constantly making improvements to the UI and how easy this will do and make sure that we don't require another devops person and we have some of that that might be little harder than we would like today but some of it is really easy and people are able to do it successfully. So that's kind of Anaconda cloud and the work group provides that capability on premise so you own it, control it in your servers and decide how your people use it. There is also a component, we manage a build queue, so that you can actually submit jobs and get kind of continuous integration of your packages and have things up and running all the time. So that's an aspect of the product too. So things like that we are selling on that side. Also for enterprise the next level is where we offer kind of enhanced Jupyter experience. Really about integration with your single sign on capability, people have enterprise single sign on they've got to integrate with, and we've taken the Jupyter and Jupyter have capabilities and we enhanced those to integrate with people's enterprise stories.
52:12 Right, so like if they are logged in on their Windows machine on their active directory and they can have like a private Jupyter notebook running on the web that only they can get to, I see.
52:25 That's exactly right. So that's what the collaborative notebook capability- we've added a couple of little things, we are working right now we have a great project going on in Bluemberg and Jupyter team, and actually we have kind of, the next generation of the Jupyter notebook, kind of as a foster project and we are working closely with kind of how do we improve that, a lot of stuff again, we love to contribute to the open source, we love make the foundations even better, and then typically we sell things that really help the enterprise and 52:57 into their deployment conversation.
53:01 It seems to me like running a successful open source business is about having like different chanals for different things you can offer to different people. There is a guy that blogs and podcasts a lot about tech business stuff and he had really interesting seeing that it's harder to go from 0 to 1 cent than 1 cent to 10 dollars. Do you feel like it's kind of that hard to get somebody to open up their pocket book at all is like the really big step and then you kind of have this relationship.
53:39 Yeas agreed, agreed, it's a big jump to go from not paying you into paying you something.
53:47 yeah, even if that something is super small, right.
53:49 Exactly, and when we first started the company that was actually why we had this add-ons and we've just switched our promotional model what we are selling on the product side, so we kind of in the middle of that transition still little bit of some customers what we used to sell these add-ons and part of that the reason we did that was precisely to just have a conversation about selling something with those who would engage in that direction. So I knew when we promoted I knew we would be driving a lot of stuff for free and I was excited about that but I knew we also needed to have I'm going to sell you something so we could be on the market appropriately and people can if understand they only want to engage in our free stuff they can do that. We certainly trying to encourage adoption and hopefully some of those might buy our stuff later but if you decided just to use the free stuff, use the free stuff, then let's collaborate in open source let's collaborate around how we move this thing forward together.
54:55 Yeah, absolutely, that's great. So Travis we are getting kind of short on our time here, do you have any call to actions for the listeners out there?
55:03 Yes, you bet, absolutely. I would say definitely go and download our content if you haven't tried it, I think if you are new user absolutely download our content, but if you are old Python user you might find that we've actually solved some things that you are having trouble with. In terms of and you might try it, it doesn't interfere your Python installation it's a new, you are going to install a new user account and kind of use it separately. A lot of people used to tell me they are afraid to install anything new because they would mess up their work, Anaconda doesn't mess anything up. It's just completely separate.
55:36 Yeah, and you've got Python 2 and Python 3 version, right?
55:38 We have Python 2 and Python 3 version. And then you have if you want to sign up for an Anaconda cloud it's a free account and you can post packages there and if you are an academic you get access to our proprietary libraries through that you get a cloud account as well, and set up an Anaconda community somewhere, locally and attend a PyData conference. There is a PyData conference in New York coming, attend one and if that one is not coming near you there are ten next year coming near you then set up an Anaconda community and participate.
56:11 Yeah, that's excellent. Are you guys going to be at PyCon?
56:15 Yes, we go to PyCon, we will be there again in Portland this year.
56:24 Yeah, excellent, that's my home town, I already got my tickets so, it's going to be good.
56:27 Yeah, I'll see you there again.
56:27 Yeah, I hope so. Cool, and then two other questions I always ask my guests before I let them go. If you are going to write some Python code, what editor do you open up?
56:36 So it's either VI or Sublime text these days.
56:42 Ok. Good, and of all the PyPi packages out there in the world, 60000+ of them, do you have some favorites or maybe thing that people don't know about and you are like, "oh I wish you knew about this, I should tell everyone"
56:58 There is so many, I would tell you about Distributive. Distributive is one that we are working on, Matthew Rocklin who has made me know about because he is really cool, and there is a lot of cool stuff, Distributive is a way to parallel computing, in a modern Pythonic approach and it's going to be the way Dask runs on mobile machines, so it's a separate library and it will be the foundation for how Dask there is two Distributive and Dask I think those are worth your attention and a lot of people already know about them, but those who don't they are worth looking at because they are very they are going basely give to Python a spark equivalent. For at least medium scale.
57:46 Ok, that's awesome. Travis, it's been great, thanks for being on the show, I really appreciate it.
57:59 I really appreciate it Michael thank you so much and good luck, and good luck with future shows.
58:03 Yeah, thanks a lot, bye bye.
58:03 This has been another episode of Talk Python To Me.
58:03 Today's guest was Travis Oliphant and this episode has been sponsored by Hired and Digital Ocean. Thank you guys for supporting the show!
58:03 Hired wants to help you find your next big thing. Visit hired.com/talkpythontome to get 5 or more offers with salary and equity right up front and a special listener signing bonus of $4,000 USD.
58:03 Digital Ocean is amazing hosting blended with simplicity and crazy affordability. Create an account and then within 60 seconds, you can have Linux server with a 20 GB SSD at your command. Seriously, I do it all the time. Remember the discount code - TALKPYTHON
58:03 Did you know you can personally support the show too? Just visit patreon.com/mkennedy and join the over 100 listeners who contribute between 1-2 dollars per episode.
58:03 You can find the links from the show at talkpython.fm/episodes/show/34
58:03 Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes and direct RSS feeds in the footer on the website.
58:03 Our theme music is Developers Developers Developers by Cory Smith, who goes by Smixx. You can hear the entire song on talkpython.fm.
58:03 This is your host, Michael Kennedy. Thanks for listening!
58:03 Smixx, take us out of here.