Learn Python with Talk Python's 270 hours of courses

#34: Continuum: Scientific Python and The Business of Open Source Transcript

Recorded on Monday, Oct 26, 2015.

00:00 What if you built a product that dramatically improved how hundreds of free open-source Python

00:05 libraries work together and gave it away to the world for free, and then built a thriving business

00:11 on top of that? It's the open-source dream, really, isn't it? This week, we talk with Travis

00:16 Oliphant from Continuum, who did exactly that. This is Talk Python to Me, show number 34,

00:23 recorded October 26, 2015.

00:26 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the library,

00:56 the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter,

01:00 where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm,

01:05 and follow the show on Twitter via at Talk Python. This episode is brought to you by Hired and

01:11 DigitalOcean. Thank them for supporting the show via Twitter, where they're at Hired underscore HQ

01:17 and at DigitalOcean. That's right, DigitalOcean has joined Talk Python to Me as a sponsor.

01:22 Thank you guys for supporting the show. You rock.

01:24 Hi, everyone. Thanks for listening today. Let me introduce Travis so we can get right to the

01:30 interview. Travis Oliphant has a PhD from the Mayo Clinic and a BS and master's degree in mathematics

01:36 and electrical engineering from Brigham Young University. Since 1997, he's worked extensively

01:42 with Python for numeric and scientific programming, most notably as the primary developer of the NumPy

01:49 package and as a founding contributor to the SciPy package. He is also the author of the definitive

01:55 guide to NumPy. As the CEO of Continuum Analytics, Travis engages customers in all industries,

02:01 develops business strategy, and helps guide technical direction for the company. He actively

02:07 contributes to software development and engages with the wider open source community in the Python

02:11 ecosystem. He has served as a director for the Python Software Foundation and as a director for

02:16 NumFocus. Travis, welcome to the show.

02:19 Hey, thanks. Appreciate it, Michael.

02:21 Yeah, I'm really glad to have you here on the show. I know you guys are doing some amazing stuff with

02:26 Python and really leading the way with the whole scientific computing angle of it. So I'm really

02:31 excited to talk to you about that today. But before we get into it, yeah, absolutely. Before we get into

02:35 the details, though, what's your story? How do you get started in programming?

02:38 Well, programming, I was a young child. I was in fourth grade, I believe. I remember my first

02:44 Atari basic class. It was an after-school program. I just got pretty excited about it. My dad was a

02:50 programmer as well. And so, you know, part of it was, oh, I can kind of do what he's doing.

02:54 But the computers back then were very accessible to a hobbyist mind. And the expectations were low,

03:00 too. We didn't have, you know, interactive video games to try to reach. It was an interactive text

03:06 game was actually interesting. And so I just got started there and had a Timex Sinclair, a TI-49A,

03:13 just hobbyist computers. And I was kind of doing it from the very beginning and kind of did,

03:18 I enjoyed that as well as I enjoyed a lot of math. So my first, I wrote Pascal in high school.

03:24 I was part of an AP computer science class and then went to college and where I was studying

03:28 engineering and physics and I learned C. And that's kind of how I, and then I started to

03:33 use, you know, go to MATLAB and a bunch of stuff up there.

03:35 Okay. Interesting. So you kind of built up into the more serious languages like C and so on.

03:42 And then I'm guessing you're doing MATLAB because you were trying to do some kind of scientific

03:46 visualization stuff, right?

03:49 Yeah. Yeah. It's a high level array programming for electrical and computer engineering controls and

03:56 signal processing, image processing. It was much easier to think at a high level than have to worry

04:03 about all the details of programming. I could program in C at that point, but it was much nicer to think at

04:07 a high level and not have to worry about those details while I was thinking about the math problem I was solving.

04:12 Yeah. That's for sure. No matter how good you are at C, you're going to be that efficient in it,

04:17 right? Just there's, you're really, really way down there, right? And so MATLAB is really great for

04:22 building these simulations and answering these questions. But one of the areas where those types

04:28 of systems fall down is like building actual apps that run, right? That you can run in production.

04:35 And so is that kind of what led you towards the Python story?

04:37 Yeah. A little before that, because I was still a grad student when I found Python, it was more, it was two things.

04:44 One is I, in the back of my mind, I was bothered that I was sort of writing in MATLAB and anybody who wanted to use my

04:50 code had to also have this, you know, buy this expensive package before they could benefit from what I'd done.

04:56 And so that, that always sort of bothered me. And then I didn't, then what was the proximal reason I had to switch to Python was I ran out of memory.

05:06 I was doing very large three-dimensional simulations of five-dimensional, actually five-dimensional data, but the three dimensions of space and time and a third axis of polarization.

05:17 And I needed to, it just wasn't fitting in memory. So if I switched to float 32, it would fit for the kinds of problems I was solving.

05:26 And I didn't, they didn't have those data types in MATLAB. So I was looking around for another way to do this.

05:30 And that's when I found Python and the numeric package and kind of got hooked.

05:35 I had done some Perl scripting in the past to kind of steer scientific compute, like 4chan libraries.

05:40 And so I understood that use case, but I also having the numeric package available, let me do high-level simulations quickly.

05:49 And then just this wealth of other things got me kind of started.

05:52 And I just loved the language and the syntax and it was close enough.

05:56 I didn't have to, it didn't, I didn't have to think about computer science.

05:59 I could think about my problem at a higher level.

06:01 So that's what hooked me.

06:03 And then it was readable later.

06:04 That's what I really appreciated about it.

06:07 So that was in grad school about 15, 16 years ago, 17 years ago now, that I got started with Python.

06:13 Okay, that's cool.

06:15 Yeah, you and I were in grad school, not too far apart, actually.

06:18 In time, maybe, maybe in distance, but in time.

06:22 So the scientific Python looked really different 15 years ago than it does today.

06:28 I would say scientific Python was really different two years ago than it does today.

06:31 But, you know, that far back is really interesting.

06:34 So maybe you could tell us what it was like then and how you've seen it evolve over time.

06:38 Yeah, so back then, a lot of scientific Python was steering other codes.

06:44 There's a lot about you'd write a script to kind of call other either machine codes,

06:49 other codes written in C or 4chan.

06:54 And you'd be just steering, and you're doing a lot of data, kind of data munging,

06:57 maybe file reading, reading input data and writing it out, kind of using the string processing facilities.

07:02 And then because Python could be extended, then it was not too difficult, you know,

07:07 that the SWIG package was available at the time.

07:09 It was really instrumental early on in kind of making it easy to wrap large code bases

07:12 and make them accessible to Python.

07:14 And so people were able to then kind of have objects in Python that mirrored their low-level objects.

07:22 And so they could start steering their program from a high level.

07:25 And that kind of, as numeric, as an array processing library emerged, and then NumPy came a little bit later after NumArray,

07:33 that whole process led to more people actually at a high level just building their array-oriented constructs

07:39 just in Python itself in the very beginning, instead of having it kind of be an object layer on top of low-level machine codes.

07:46 Okay.

07:47 So in the beginning, it was just about sort of an easier API to what people were already doing in other places.

07:55 But then...

07:56 And helping to test those, too, it kind of became like it was really easy to test your C libraries,

08:01 your 4chan libraries.

08:02 Right.

08:02 Absolutely.

08:03 But then it sort of...

08:05 The more it caught on, it was more like, well, we'll just stay here, right?

08:08 Yes.

08:09 Right.

08:09 I can just use this.

08:10 Like, the spelling here is nicer and easier, and I don't ever have to drop to the C level.

08:15 And there were some real important things that happened that conveyed that.

08:19 Not a process is still happening.

08:20 I mean, there are still people that write 4chan, still people that write C.

08:24 And the fact that Python is this great glue continues and continues to make it a strong use case.

08:29 You were one of the people that was involved with NumPy and SciPy and all that,

08:34 actually getting that off the ground, right?

08:36 Yeah.

08:37 Yeah.

08:37 I was a grad student who was really...

08:40 Like I said, I fell in love with Python as a grad student.

08:43 And it was a weird reaction to have for a language.

08:46 But what I loved is just the ability to do things quickly.

08:50 You know, I could sort of iterate quickly.

08:53 I could think at a high level.

08:54 The concept of Python fits your brain and gets out of your way is a common meme.

08:59 And I felt...

09:00 I just totally felt that.

09:01 I could think about my problem.

09:02 And then...

09:04 So I got excited about it.

09:06 And then I looked around and said, well, but I'm missing some libraries here.

09:08 I need integration.

09:09 I need optimization.

09:09 I need statistics.

09:11 I need these linear algebra libraries.

09:14 And a few of them were numeric.

09:16 And there were a few scattered around.

09:17 But I kind of got BMI bonnet around 1998, 99.

09:21 And started just...

09:22 I learned how to write an extension module.

09:24 You know, Guido and something called Table.io from Mike Miller showed me how to write an extension

09:29 module.

09:29 And then I just got hooked.

09:30 And I started writing a bunch of them.

09:31 That's great.

09:32 So around a whole bunch of old 4chan codes kind of made them more accessible.

09:37 And that's kind of the beginnings of the SciPype project.

09:40 Yeah, very cool.

09:42 Take that C knowledge you had and put it to use there, right?

09:44 That's right.

09:45 That's right.

09:46 And this was back when, you know, you had to hand wrap and track reference counting.

09:49 And it was...

09:51 The fact that I knew C was really critical to making sure that would happen.

09:54 It's certainly a lot easier now.

09:56 You can do it with Cython.

09:58 Numba.

09:58 It's really easy now, actually.

10:00 Yeah.

10:00 Comparatively.

10:01 These kids, they don't know how good they have it.

10:03 Right.

10:04 So somewhere along the way, you started a company called Continuum, yeah?

10:11 Yeah.

10:13 Yeah.

10:13 That's fairly recent, actually, in the whole history of things since I've been doing this

10:16 for 16 years.

10:18 And, you know, I was a professor for a while.

10:20 And that's where I wrote NumPy, kind of growing from the work of the Space Science Institute,

10:25 the NumRae work, and then, you know, the previous work of Numeric Community.

10:29 NumPy brought it together in 2007.

10:31 2006, 2007.

10:32 And that's been very successful.

10:34 It's really thrilling to see how many projects have been building around that.

10:38 And then also, it's been thrilling to see the community rally around and keep supporting NumPy.

10:41 There's a lot of great people contributing to that project today.

10:44 And same is true with SciPy.

10:47 But I left my academic career behind around 2007.

10:50 And I had six kids.

10:53 I need to support them.

10:54 So I need to figure out a way to make my way in the world.

10:57 And I started doing consulting around Python for NTHOT in 2007.

11:02 And then in 2011, Peter and I were kind of talking about some gaps and really trying to, you know, realizing that Python needed to make a bigger story in the big data world and have a stronger play in the web visualization and big data world.

11:19 And so we started a company in 2012 to really make that happen.

11:23 Yeah.

11:23 And your company is called Continuum.

11:25 And you guys have a really ambitious goal.

11:28 A really lofty goal.

11:30 And I love raising your website because, you know, it's kind of like there's all these great scientific projects and thinkers and stuff out there.

11:40 But when they go to solve their problem, they end up, you know, stuck at a command line saying cannot compile some random C thing because they're trying to install, you know, a Python module that's got C extensions.

11:52 And they just, you know, like they're stuck.

11:55 Right.

11:55 And so if you could build an environment where everything was sort of import anti-gravity, right, it's like all super easy.

12:03 But even for the science-y stuff that's hard, right, that's kind of your goal?

12:07 That's definitely a part of our goal.

12:09 I would say that at Continuum we've had no shortage of lofty goals.

12:13 Peter and I are both ambitious thinkers and kind of full of ideas and thoughts.

12:21 And I've had things been building up over years.

12:24 So when we started Continuum, we had really, really ambitious goals, not only around package management.

12:28 Really, that kind of came as a sub, as a corollary of our other goals.

12:32 Our goals were to make it really easy for data analysis and scientists, you know, people that change the world is the way we talk about them today.

12:39 People that change the world need a way to do it easily and not to be loaded down by DevOps and issues.

12:47 They need to be able to create visualizations that show up on the web and they need to be able to take advantage of modern hardware easily.

12:53 So they need to run in parallel.

12:55 They need to be able to take advantage of GPUs.

12:57 And they need to be able to translate their great ideas into code that does that with a nice web front end.

13:04 Right.

13:04 So, and then if we're going to really pull that off, we've actually got to, the first problem is how do I ship this, this great stack of tools to people that I built?

13:13 And so Anaconda came out, kind of came as a, this really immediate corollary from our overall goals.

13:18 We need to really work on the packaging problem.

13:22 Right.

13:22 So you're thinking data science, visualization, scientific computing.

13:26 Okay.

13:27 Python.

13:27 Right.

13:28 Now.

13:28 Data science at scale in a, in a web browser visualization.

13:31 How do I get that installed to everybody?

13:34 And we got to, we got to work on the distribution problem.

13:38 Right.

13:38 The starting point.

13:39 Yeah.

13:39 So vanilla Python is a fine base for that.

13:42 But, you know, if, if you tried to install, you know, matplotlib on windows.

13:48 Yeah, exactly.

13:50 It is no treat, right?

13:51 You're like, what?

13:52 No treat.

13:52 Right.

13:53 What is up with this VCVars.bat?

13:56 I didn't want this.

13:57 Well, what's, what's, what's cool about Python is how easy it is to integrate with so much other stuff.

14:04 Right.

14:04 It's a glue that really, you know, it's why it was adopted in the scientific domains and why it continues to be powerful.

14:10 But the other end of the stick of that glue is means you've got a lot of other systems you're interacting with.

14:16 And so installing something, you know, sci-fi, for example, has notoriously been difficult to install because it's a, it relies on 4chan code.

14:23 So you're compiling 4chan code and then you're integrating it.

14:27 And, you know, I was, I use 4chan.

14:30 Any, any decent science can use 4chan.

14:33 Well, today people are like, I don't want to have a 4chan compiler managed on my system.

14:36 Yeah, absolutely.

14:37 It becomes a challenge.

14:40 So, and then, you know, the visualization compounds that with additional C libraries and additional configuration issues that are slightly different on different platforms.

14:49 And just how do you bring all that together?

14:50 It's a real, it's a hard problem.

14:53 And so we've taken strides.

14:55 And part of that, part of those strides is creating the Conda package manager.

14:59 You know, really having a cross-platform package management solution that goes beyond just Python and lets you install Node and R and anything else.

15:09 Java, Ruby, Scala.

15:11 You can install all of it with Conda.

15:13 And then on top of that, being a definitive source for freely available binaries on mobile platforms for these primary fundamental packages that make up the PyData stack.

15:25 Okay, so that's interesting, both of them.

15:28 But the fact that you say, all right, well, you know, how different is everybody's Mac?

15:34 We could probably compile that once and then nobody has a problem of getting this obscure compiler to work on their environment, right?

15:41 Right, right.

15:43 That is a, and that's an important question, actually, because what is it, what is a platform?

15:47 We've asked this, we have to look at this question really hard.

15:49 I mean, a platform fundamentally is actually the tree below you.

15:53 It's all the software dependencies that you're not caring about, that you depend on.

15:57 That's actually a platform, right?

15:59 That's right.

16:00 Now, when we talk about OS platform, people kind of cut that tree at a particular plane and usually go, well, okay, here's, we're going to use the OS for the Lib C and this set of libraries.

16:08 Everything else above that is managed from the package manager.

16:11 And that's, and for us, the way we manage that is, like a Windows, they do a lot of work to manage that.

16:21 You know, and basically if you compile for a particular Windows box, it's going to be pretty well assured to work on every release of the software above that.

16:31 So usually it's about picking the baseline.

16:34 It's about picking the baseline.

16:36 So with Windows, we used to, you know, we used to do Windows XP.

16:38 I think we just moved to Windows 7.

16:40 That's kind of our, you know, that's where we compile all the software so that it works on Windows across that.

16:46 Which compiler do we pick?

16:48 Well, you know, in Python 3.5, we just moved to Visual Studio 2015.

16:51 In Python 2.7, it was Visual Studio 2008.

16:55 And so, you know, we manage that.

16:57 And then on macOS X, okay, it's version 10.7 of the OS.

17:01 That's where we, in this version of Xcode, that's our base.

17:04 On Linux, it becomes a little bit harder, but there we just, it's the compiler and the libc.

17:10 And then everything, it doesn't matter which distribution it is, it works on your, the only we rely on is the libc.

17:18 Okay, yeah.

17:19 That's how we manage it.

17:20 These are hard questions.

17:22 Yeah, and it's hard to generalize that.

17:25 But that's, I think that's a really hugely valuable service.

17:29 It's because, you know, just, I think it was this morning, I was trying to install micro WSGI on Ubuntu 1510.

17:38 And for some reason, it would not compile correctly, trying to install the main environment, but it would go into a virtual environment, right?

17:44 And just these little headaches you keep bumping into.

17:47 And when you're experienced, you're like, okay, fine, we'll just do it this way or that way.

17:50 But when you're new or programming is not your full-time thing, you're a scientist.

17:55 Right.

17:55 And you just need to do it.

17:56 It's a showstopper.

17:57 It's a showstopper, right?

17:59 That's exactly right.

17:59 And that's actually something, you know, a lot of people have, because Conda has emerged at the same time that the Python packaging hasn't been improving.

18:07 You know, pip and VirtualM have gotten a lot better over the past several years than they were when we started.

18:11 And our story in Conda has some overlap with that, but it really has a different focus.

18:17 If you use pip and VirtualM, you're saying, yes, I'm an integrator.

18:21 I'm going to make this all work for my system.

18:23 And that's great.

18:24 And that's something people want to do, but not everybody wants to do that or really should be trying to do that.

18:28 Conda is about, and Anaconda is about, here's an easy-to-use platform for open-source analytics that you can just get started with out of the gate.

18:38 You don't have to be worrying about configurations right before you get started.

18:42 Just get started, right?

18:44 And then you can decide later how you want to potentially, you know, build your perfect system.

18:50 And whether you continue to use Anaconda or binaries or you decide you're going to use your own, you're going to use Conda or use your own binaries, that's a decision you can make down the road.

19:00 Or maybe you just want to recompile, you know, get the stack you want and recompile everything the way you want it.

19:04 All that's still available to you.

19:06 We're just trying to get you started quickly and get you using this great stack of tools.

19:11 Yeah, that makes a lot of sense because once you've got a working system, well, then you're willing to put in that four hours to get that thing set up.

19:19 Right.

19:20 But if you're trying to decide if it's even suitable for experimenting with, you don't want to spend that four hours, right?

19:26 Exactly.

19:27 And that was our goal from the very beginning.

19:28 I'm really excited.

19:29 I think I personally think we've helped with the recent explosion of Python in the PyData world.

19:35 Just how many more people.

19:36 I mean, it was said that actually Python is the fastest growing data analytics language.

19:41 Its rate of adoption has overtaken R.

19:45 You know, Python and R for a long time, Python was overshadowed by R.

19:49 Like nobody really talked about Python as an open source language.

19:52 There's always R, R, R this, R that.

19:55 And R is great.

19:55 Don't get me wrong.

19:56 It's got some great things about it.

19:57 But Python is used in a lot of the same context.

20:00 And it's only been fairly recently people have looked around on a commercial side.

20:03 Companies have looked around and said, wait a minute, database vendors, for example, and said, we've got to have Python support too, not just R support.

20:09 Yeah.

20:10 That's been really great to see.

20:11 That is really great to see.

20:12 And I think you're right about you guys helping with the explosion there.

20:15 I mean, to me, it's NumPy, SciPy, and then it's IPython.

20:23 And, you know, maybe a few other things.

20:25 Pandas has made a real big difference over the past several years.

20:28 A lot of people are using Pandas now.

20:29 Yeah, that's right.

20:30 Yeah, yeah.

20:30 For data frame kind of operations.

20:32 Yeah, and a lot of those are not easy to set up and install and get going with.

20:37 And so having this distribution is really great.

20:39 To me, you talked about a couple of things.

20:41 You talked about Conda, and you've talked about Anaconda.

20:55 This episode is brought to you by Hired.

20:57 Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.

21:04 Each offer you receive has salary and equity presented right up front, and you can view the offers to accept or reject them before you even talk to the company.

21:14 Typically, candidates receive five or more offers in just the first week, and there are no obligations.

21:19 Ever.

21:19 Sounds pretty awesome, doesn't it?

21:22 Well, did I mention there's a signing bonus?

21:24 Everyone who accepts a job from Hired gets a $2,000 signing bonus, and as Talk Python listeners, it gets way sweeter.

21:32 Use the link Hired.com slash Talk Python to me, and Hired will double the signing bonus to $4,000.

21:40 Opportunity's knocking.

21:41 Visit Hired.com slash Talk Python to me and answer the call.

21:45 Let's maybe focus on Anaconda for a minute.

21:57 So you've got sort of four interesting, I don't know what you call them, maybe pillars or concepts behind this distribution.

22:06 And I thought they were pretty cool, so I'll sort of go and maybe we could talk about them a bit.

22:11 So one is you're committed to open source now and forever.

22:15 What's the story there?

22:16 Yeah, so one of the things we want to make sure people realize is our roots are in open source, right?

22:23 I mean, I spent years building SciPy and years building NumPy.

22:28 And when we started the company, we wrote Bokeh and Numba and Conda, and they're all open source.

22:33 I believe infrastructure should be open source.

22:36 And one of the goals I had in starting Continuum was to create an organization that would allow a sustainable production of open source

22:45 by also selling things for enterprises that drives that.

22:49 Any revenue produced by slowing solutions to enterprises can actually support growing more open source software.

22:55 So I really believe it helps everybody, and it helps places you're not even able to target and help.

23:00 So we're real believers in open source.

23:03 We believe in it.

23:03 We think it's important.

23:04 We also believe in building on top of it and having things we sell on top of it.

23:08 But we want to give back to the community at every opportunity we have and can.

23:13 So Anaconda, as a distribution of the open source ecosystem, is my art perspective.

23:19 It needs to be open source and will stay open source.

23:22 It's really interesting to me how successful people have been with open source on the large scale and on the small scale lately.

23:31 And I think this is a new trend.

23:33 I mean, you probably know better than I do.

23:35 But I'm thinking of companies like MongoDB that are, you know.

23:40 Yes, 10-gen, yeah.

23:41 Yeah, MongoDB, that's right.

23:42 Yeah, over a billion dollars valuation.

23:45 And you can go to GitHub and get their thing they're selling for free.

23:48 You know?

23:49 Yeah.

23:49 It's so paradoxical.

23:50 Right.

23:51 But it's really interesting.

23:53 Well, some of that still has to be played out, I think, in the marketplace.

23:56 I mean, some of these large scale evaluations are a bit marked to vision, marked to dream.

24:01 Yeah, yeah.

24:02 They're looking ahead.

24:03 We'll see.

24:03 You know, they're not quite proven out whether or not they can be, you know, is there going to be a sustainable, repeatable, a sell cycle that's going to drive that.

24:11 There are known.

24:12 I think the reality is there are known sales conversations that can drive open source.

24:18 And those have been identified.

24:20 And I think there are more to come.

24:22 And people are recognizing that in order to play in the ecosystem of tomorrow, you're going to have to be open source because the developers are going to demand it.

24:31 The people who are – you have huge, huge advantages in terms of what people know already.

24:37 So if you don't, you're going to have to train people a lot more than you do if you kind of go with what open source is driving.

24:43 I think there are a lot of factors that drive it.

24:45 But our – you know, we're basically a group of folks that – we drank the Kool-Aid long ago.

24:53 So we're just continuing that.

24:54 It's sort of part of our core.

24:56 It's not something we just sort of, hey, let's try this, jump on this bandwagon.

24:58 It's who we are.

25:01 Yeah, you guys have been there from the beginning.

25:02 That's awesome.

25:03 But yeah, I definitely think that not only is it becoming – so there's lots of examples of businesses doing it successfully.

25:11 But like you said, it's being accepted by purchasers, people who are going to say, I'm going to buy a service.

25:18 Well, of course, I'm going to go get this open source project and I'll get the commercial support or whatever.

25:24 Whereas it used to be kind of this edge thing like, well, maybe you can save some money, but you might pay for it later when it doesn't work quite.

25:30 That discussion has sort of sailed, right?

25:33 Yeah.

25:35 Yeah, I think there's still some questions around – I mean, I personally believe there are a lot of people benefiting greatly from the Python and the PyTidus stack that need to figure out how they're going to help support sustaining it.

25:48 Either there's got to be – you buy a commercial distribution, support and unfocus.

25:53 I mean, there's got to be some dialogue around how are we actually going to make this work long term.

25:57 So I would definitely love to encourage people to take a hard look.

26:02 And if they're relying on this for their day-to-day, think about how that's going to maintain.

26:06 Right.

26:07 Don't shoot yourself in the foot, right?

26:09 Right.

26:10 Right.

26:10 Don't let today's price point be the enemy of tomorrow's success.

26:14 Yeah, for sure.

26:15 So that was sort of pillar one.

26:17 Number two was tested and certified packages to cover your back.

26:21 Yeah, that's right.

26:22 So that kind of speaks to the – we do a lot of work to make sure the Python works together.

26:27 And so you have a process where we release into – we have a download and we have a repository.

26:34 When we make a release of Anaconda, we basically are saying this group of software works together and we've done some integration testing.

26:42 When we make a release of a package into the open source repository, basically we've done a testing on this package that works and a little bit of integration testing.

26:49 But we haven't done like full integration testing.

26:51 So you kind of – depending on where you want to fit, do you want to sit and take only Anaconda released versions?

26:56 Because you want to make sure it all works together.

26:58 Or maybe you're willing to deal with the occasional, you know, hey, this new release of SoftwareX doesn't, you know, doesn't quite work with this other piece.

27:07 It's easy to fix, but you might have to, you know, get the configuration that works just right.

27:12 But either way, we've done testing at various levels and we, you know, we basically promised it'll work.

27:18 We – as a commercial thing, we do offer identification as well.

27:21 So that's one of the features we provide to people that will – that buy a subscription is we will actually indemnify you against the concerns about open source using and deploying open source in your organization.

27:33 We'll make sure that you're protected from, you know, copyright infringements and patent infringement concerns that the company may have.

27:39 That's a big deal.

27:41 That's really cool.

27:41 Yeah.

27:42 Yeah.

27:42 So with the sort of consistent packages, how many different packages are in the Anaconda distribution?

27:48 So the Anaconda download, we've actually been tearing it down, trying to make that download a manageable, you know, manageable 700K or 700 megabytes.

27:56 It could grow like to be three, four gigabytes.

27:59 But we've been tearing it down.

28:00 I think it has about 70 to 75 packages that are in the downloadable.

28:04 And in the repository that's a quick Conda install away is 330 currently.

28:10 Okay.

28:11 And growing.

28:11 And then, of course, you have access to everything else through pip or you have Anaconda Cloud, which provides additional sort of community provided.

28:20 Anybody can make a Conda package and put it in Anaconda Cloud.

28:24 Anaconda Cloud, excuse me.

28:25 And then they can go and you can install it from there, too.

28:29 Okay.

28:29 Okay.

28:29 Excellent.

28:29 Yeah.

28:30 One of the challenges with sort of this mix and match open source feel is there's nobody's job or responsibility to make sure that all of these pieces fit together.

28:43 Right?

28:43 There's a whole host of developers, more than 70, working on those 70 packages.

28:48 That's right.

28:49 But are they, how much coordination are they doing, right?

28:52 I mean...

28:53 None.

28:53 Well, a little bit.

28:55 Yeah, I'm sure a little bit.

28:56 They each have their own...

28:57 They each use some corner of the stack, right?

28:59 And so they'll...

29:00 For the piece...

29:01 For their itch, for the things they care about, they make sure that works together.

29:05 But it can definitely fall apart.

29:07 Like, I've periodically decided, oh, I want the new stuff.

29:11 And so I went to my...

29:13 I just did a pip upgrade.

29:14 Everything in my installation...

29:17 Yeah, that's right.

29:17 That usually is okay, except for, you know, a few where it's not anymore, right?

29:21 So...

29:23 Right.

29:23 Right.

29:23 And, you know, if you're willing to deal with that, and you've been...

29:27 Some R&A, it's kind of fun sometimes, right?

29:28 It's a big kind of cutting edge.

29:29 It works okay.

29:31 Yeah, and it just sort of depends on which cut you're making through that stack.

29:35 That's right.

29:35 But especially when it comes down to, hey, I've got...

29:39 You know, I'm tied to this vendor-specific library, this Blosser, LAPAC, linear algebra library.

29:45 I've got this, you know, visualization stack that requires this version of that GPU driver.

29:50 I mean, those are really...

29:51 It's really hard to get that working together without somebody doing that testing.

29:56 Yeah, yeah, I can imagine.

29:57 So you guys got that covered.

29:59 That's cool.

30:00 So the third one is explore and visualize complex data easily.

30:05 Yeah.

30:05 Yeah.

30:05 So here we're basically advertising the capability of this PyData stack and helping people understand

30:10 what you can do with it.

30:12 And, you know, the fact that you've got such great visualization tools...

30:16 You know, we spend a lot of time on Bokeh, but we also use Matplotlib and we use, you know,

30:21 the BizPy and Myavi and the other tools that let you basically bring your data to life quickly

30:29 and inside of Jupyter or on the command line or inside of Spider, depending on exactly what

30:33 you want to use.

30:35 All of it's available and easy and kind of at your fingertips.

30:37 Yeah, that's excellent.

30:39 I installed Anaconda and go into the environment, iPython, space, notebook, boom, everything works.

30:46 It's ready to roll.

30:47 It's not...

30:48 Nothing to configure, right?

30:49 Same thing for Spider.

30:51 You can pull up the IDE, which is like a scientific computing IDE type thing.

30:55 And it's all right there, right?

30:57 So that's cool.

30:58 Right.

30:59 So that's basically, you know, it's a plot works and for Matplotlib.

31:03 And then if you're...

31:06 Bokeh provides a charting API, so you can do complex histograms.

31:10 The Seaborn interface, the Matplotlib works out of the gate.

31:13 You know, Pandas has its histograms and plot tools.

31:16 You can just bring up a data frame and type dot plot on it, and it'll bring up this interesting

31:20 plot.

31:21 Just lots of things available.

31:23 You just basically have to do a Google search on the kinds of analysis you want to do,

31:26 and it's all ready to go.

31:27 Yeah.

31:28 And almost all of it will work out of the box.

31:30 Yeah, that's really cool.

31:31 I definitely, if somebody out there is learning data science, definitely start with Anaconda,

31:36 right?

31:37 Because then you've got to focus on the actual thing you're learning and not fighting with

31:40 compilers.

31:41 Exactly.

31:41 Not becoming a distribution.

31:43 Not becoming a self-integrator.

31:44 It's the official term that Nick Coughlin uses for people who are using PIP.

31:49 That's great.

31:51 So you talk about Boca a lot, and I don't really know much about it.

31:56 What is it?

31:56 Yeah, great.

31:58 So Boca is...

31:59 I say Boca, but people can say Boca.

32:01 I think Boca is a plotting library for people that don't want to have to learn JavaScript

32:09 but want visualizations in the web.

32:11 Okay.

32:12 So it's kind of like D3 for the rest of us, kind of for the data scientist who knows Python

32:17 or R or another high-level language.

32:20 But you can write complex visualizations, kind of high-level plotting, charting, histograms,

32:25 cross plots, but have it show up in the web.

32:30 So that's Boca.

32:31 Boca has a JavaScript side.

32:33 There's a JavaScript...

32:34 Actually, I think CoffeeScript is the current implementation detail, but it's a JavaScript library that

32:39 you then embeds in your browser.

32:40 And you communicate with it via a protocol.

32:44 So JSON objects.

32:45 You communicate it back and forth between your server.

32:48 Either as a one-time static publish, you know, embedded in my HTML is this JavaScript that talks to the Boca.js and produces this visualization.

32:57 So you have a static interactive plot where you...

33:00 Interactive, I mean, you can zoom into the data.

33:01 You can drag and drop.

33:03 You can, you know, slide...

33:05 You can do some selections of points and have the visualization update as you're doing that selection.

33:10 So the data is static and it comes down all at once, but then you can explore it however you want, sort of visually.

33:16 So that's one use case, right, where you want the data static and you want it all in your browser at one time and you're exploring it all at the same time.

33:22 But the other use case, it's an important and growing use case, is where you have too much data.

33:26 You don't want it all in your browser.

33:27 You have to interact with a large data set and you're communicating via maybe a WebSocket API or kind of an interactive API.

33:35 The browser is talking to a client and, you know, you're interacting, sending viewport information back to the server.

33:42 And there's a Bokeh server that's then bringing the information and changing what's actually shown up in the browser directly.

33:51 So there's this bidirectional communication happening between the server and the client.

33:56 And that also works.

33:57 And it's pretty straightforward.

34:00 You don't have to know all those details.

34:01 You just kind of set up your visualization in a very simple API.

34:05 And then that kind of comes for free.

34:08 That interaction kind of comes for free.

34:10 And there's a lot – so there's – you know, there's a lot of possibility there.

34:13 And so there's an API.

34:14 There's a – there's a – both a – there's a lot of great APIs.

34:18 There's a plotting API, kind of relatively low level.

34:21 It's not really that low level.

34:22 It's more like – it's kind of like the plot level with, you know, I'm going to change this glyph.

34:27 I'm going to change that glyph, put these axes here.

34:28 Then there's a charting API that might take a data frame, might take a high-level object that you can then quickly build a histogram or a nice chart.

34:39 Then the other thing Bokeh provides is novel graphics.

34:42 Like you can just – I'm going to draw a glyph.

34:45 I'm going to draw the rectangles.

34:46 I'm going to draw circles.

34:47 And kind of a – and because we come from the scientific Python background, it's vectorized interface.

34:53 So with a single segment command, I can draw 100 lines, you know, and I give all of those endpoints in one big vector.

35:01 Does that make sense?

35:03 Like a single command will generate a bunch of circles.

35:07 Yeah, sure.

35:07 Or a bunch of glyphs.

35:09 Yeah, I was looking at some of the graphics on bokeh.pydata.org.

35:15 Yeah, exactly.

35:17 It's a great place to go.

35:18 Yeah, there's amazing stuff over there.

35:19 Yeah, it's just like a contour plot, all the different kinds of plots you can think of.

35:25 And just – that's really nice.

35:27 Exactly.

35:27 There's some examples of chloroplats and kind of novel graphics is one of the key pieces.

35:33 So, you know, the key things are novel graphics.

35:36 It's a library.

35:36 It's a fast library.

35:37 Kind of – if you have – if you have visualization you want to do and you have data you know how to access in Python or R, actually.

35:45 There's an R bokeh interface.

35:46 Check out bokeh because it may help you get your visualization done quickly.

35:51 And it can produce a static HTML or it can produce an application, basically, a visualization application.

35:56 Yeah, that's really cool.

35:57 For the server-side component.

35:58 Yeah, and because it runs on JavaScript, sort of you automatically get like distributed computing, right?

36:04 You offload a lot of the computation to the viewers, right?

36:08 Exactly.

36:10 This episode is brought to you by DigitalOcean.

36:27 DigitalOcean offers simple cloud infrastructure built for developers.

36:31 Over half a million developers deploy to DigitalOcean because it's easy to get started, flexible for scale, and just plain awesome.

36:38 In fact, DigitalOcean provides key infrastructure for delivering Talk Python episodes every day.

36:45 When you, or your podcast client, download an episode, it comes straight out of a custom Flask app built on DigitalOcean, and it's been bulletproof.

36:53 On release days, the measured bandwidth on my single $10 a month server jumps to over 900 megabit per second for sustained periods, and there's no trouble.

37:02 That's because they provide great servers on great hardware at a great price.

37:06 Head on over to DigitalOcean.com today and use the promo code TALKPYTHON, all caps, no spaces, to get started with a $10 credit.

37:14 Yeah, if you're trying to generate JPEGs and send them down, it's all in your server.

37:30 Yeah, exactly.

37:31 It's kind of in the direction of modern web applications.

37:35 You know, it's always impossible to publish images and static plots to the web.

37:39 It's sort of just traditional standard HTML serving.

37:42 You embed JavaScript, and there's a bit of interactivity.

37:45 And the next step, of course, is to have that bidirectional communication with the server, that they use WebSockets to do that.

37:51 And Bokeh makes it easy to take advantage of all of that.

37:55 Yeah, and it's already beautiful, so you don't have to be a designer.

37:58 Very nice.

37:58 Right.

37:59 You don't have to be a designer.

38:00 Right.

38:01 You don't have to learn JavaScript.

38:02 You know, we wrote the JavaScript, so you don't have to.

38:04 It's one of the, kind of make it accessible to the data scientist or the scientist who's, you know, the issue with the data scientist, they may learn Python because it's accessible.

38:14 Not necessarily become an expert Python programmer, but they don't learn enough Python to do their workflow and to accomplish their goals.

38:21 And they don't want to learn a ton of languages in order to get all the way there.

38:25 And quite often, getting farther along means building a publication or building a graph, building a visualization.

38:30 And so, you know, our mission is to support that group of people and make it really possible for them to translate their ideas and their thinking to real-world interactive visualizations they can communicate to somebody else about and have that just happen seamlessly and transparently.

38:46 And we do that, you know, part of its open source libraries that form a core, part of its, you know, integration pieces that we maybe sell at a high level.

38:54 And, you know, someone's just doing services for somebody to build that solution for them and ship it on premise.

39:02 Yeah, really nice.

39:02 Did you guys create Bokeh?

39:04 Yes.

39:05 Yes.

39:06 Bokeh is a project we started.

39:07 So did you look around at the other things that were out there and say, you know what, all of these are nice, but they just don't quite fit the story of getting on the web?

39:16 We did.

39:17 Yeah.

39:18 Exactly.

39:19 We spent a lot of time.

39:20 We looked at D3 pretty hard.

39:21 Like some people have done a really good job of putting Python interfaces to D3, you know, and certainly Bokeh, you can also integrate D3 and Bokeh together.

39:30 They're not sort of, it's in the web, so you can build kind of mashups.

39:34 But, you know, we wanted to use the canvas.

39:37 We wanted to make sure we could scale out to millions of points, and we actually can get to billions of points on the browser-ness per se, but you can deal with billions of points and then push the ones in the browser you need.

39:49 We have a technology called data shading that's a really exciting technology associated with the Bokeh project that's up and coming, kind of one of the only ways to visualize billions of points easily.

40:00 And we wanted to control over that.

40:03 So we knew kind of where we were headed.

40:04 We knew we had to do that.

40:05 And, you know, it's been really good.

40:08 It's hard because you have to build a JavaScript library and build the Python interface on top and then promote it, ask all the questions.

40:16 But it's given us the control.

40:18 One aspect about other JavaScript libraries is they weren't really built with other languages in mind.

40:24 There's a lot of libraries that say, basically, yeah, you should be a JavaScript developer, and then you can use it.

40:29 If actually Bokeh is a JavaScript library that's easier to use from Python and R than it is JavaScript.

40:34 Yeah, that makes a lot of sense.

40:37 I mean, a lot of those great libraries, they're like, okay, well, you can do this in the client side, and then you connect to your own custom services that you dream up.

40:47 Yes.

40:47 And you write the server side, and you do all this stuff.

40:49 And, again, if you're a web developer, fine.

40:52 If you're a data scientist and you're trying to solve a problem, it's easy to forget as developers that not everybody in the world is a developer.

41:01 They have their own special skill, right?

41:03 And so they let them focus on their skill.

41:06 That's cool.

41:07 Right.

41:08 And the cool thing about Python, honestly, is it bridges that world, those worlds.

41:13 And it's because of this.

41:14 I think it's because it was a teaching language to start with.

41:16 And so it's meant to fit in your brain, kind of, and not take too much space there.

41:19 And so leverage your English language centers and other things you do.

41:23 And the white space, the visualization, the white space consistency is part of that.

41:28 And so because of that, lots of capability, lots of scientists, lots of data scientists come to Python and can work there together.

41:36 And they start to want things like a developer would want, but they want it a little differently.

41:41 They don't want to become developers.

41:43 They just want the capability.

41:44 That's right.

41:45 Kind of presented to them.

41:47 That makes perfect sense.

41:48 Yeah, it makes perfect sense.

41:49 So what does a typical Anaconda user do?

41:53 Like, who is your typical user?

41:56 Like, what are some notable things people have created?

41:59 Things like that.

42:00 Oh, yeah.

42:01 I mean, that's a great question.

42:02 They expand the spectrum.

42:03 It's what I love about the work we do and what I've been doing the past 15 years, honestly, is just the breadth of smart people that we interact with.

42:12 You know, so it might be a geophysicist trying to understand where to find oil.

42:16 It might be someone managing a reservoir.

42:18 It might be somebody on Wall Street.

42:21 Maybe they're managing a portfolio for their clients.

42:25 Or maybe they're trying to figure out a new derivative, a way to manage risk and trade risk.

42:28 Or they're trying to figure out how to avoid a 2008 crash.

42:32 You know, figure out what their risk of exposure is to people.

42:36 It could be, I mean, we have people who are actually this really great user has a story around, they're finding rare diseases.

42:44 They have a way to use microscopy to kind of single genome, single gene changes that cause certain rare diseases.

42:54 So they're diseases that are not well funded.

42:56 Nobody looks at their heart expensive to do anything with.

42:59 But they have a technique where they can study it and understand it because it can be reproduced with a single gene knockout or a single gene change.

43:07 And so they're using anaconda and bokeh and visualization to try to, and Jupyter notebooks, to look at workflows where they image a bunch of cells with different knockouts and try to find drug treatments.

43:21 There's pharmaceutical companies that have lots of drugs that they're looking for use cases, right?

43:26 And they have a platform for essentially testing these use cases against these single knockout models of genetic diseases.

43:33 So, you know, people who are helping essentially find cures for rare diseases, basically using these tools, which is awesome to see.

43:42 Yeah, that's really great to see people actually making lives better with software that you create or at least help support, right?

43:50 Exactly.

43:51 You know, that really makes my day when I, I mean, it really helps motivate.

43:54 It helps motivate me.

43:55 It helps motivate the company.

43:56 It's what we try to do is really make, like our mantra is, you know, we make the world better by helping people that change the world do their job.

44:03 Yeah, yeah.

44:04 Very cool.

44:05 What is the most surprising sort of use you've seen people try to do something with?

44:12 That's interesting.

44:15 I think, I think some of the embedding, you know, people embedding it in really small footprint places.

44:21 You know, I think that's what, you know, like I didn't expect people to like try to get Anaconda running on a tiny phone or tablet or, you know, if it was getting bigger these days.

44:30 But like, I'm less surprised now, but I think early on I was kind of surprised.

44:35 Yeah, what people do with like mobile stuff these days is, you know.

44:38 Yeah.

44:38 Like the Raspberry Pi.

44:39 Like the Raspberry Pi was an example.

44:42 But, you know, we had, you know, one cute Elon for April Fool's a few years ago did a Python 1.0 Conda package.

44:54 Just to kind of install Python 1.0.

44:56 How nice.

44:57 Funny.

44:58 Just a proof of confidence.

44:59 Of course.

45:00 You can manage all kinds of fire versions of Python.

45:03 That's a hard one.

45:05 I'll have to think about that.

45:06 There's a lot of great use cases, but.

45:08 Yeah, it's hard to pick the most surprising, right?

45:10 They're all surprising, I'm sure.

45:13 We talked a little bit about running a business on open source.

45:18 And I went and I got Anaconda and I downloaded it.

45:22 And you didn't even make me give you my email address, which thank you.

45:25 Thank you for that.

45:26 But you have Anaconda free.

45:29 And then you have some other things that are kind of products.

45:32 So you've got like a pro, a work group, and an enterprise version.

45:36 And you also talked about the Conda cloud.

45:38 Is that right?

45:39 Yeah, Anaconda cloud.

45:40 Yeah.

45:41 Okay.

45:41 What are all these?

45:42 Yeah.

45:44 So maybe I can address the first question kind of generically.

45:47 We do have to sell things to be a business, right?

45:50 Like businesses, at the end of the day, they sell stuff, right?

45:54 And what we sell is we sell services, we sell training, we sell software.

45:58 And what we described is our software offerings.

46:01 And we have subscriptions to Anaconda.

46:04 And their targets are really the enterprise usage.

46:07 You know, if somebody, we want hobbyists and academics and even people in enterprises to

46:12 use this.

46:13 But if you start to become dependent on it, you're allowed to think about looking at subscriptions

46:16 so that we can support you in your use cases and make sure that your new versions don't go

46:22 awry and you can be well supported.

46:23 So the first tranche is basically just supported Anaconda.

46:27 And that gives you the identification and then the priority support so you can call us and

46:31 kind of get what you need fixed on your timescale instead of ours or your sort of open source

46:37 timescales.

46:37 So that's kind of the first thing.

46:39 Kind of the...

46:41 I personally believe that that's not enough.

46:44 I think people, from a business perspective, I think just the way we are as people, we kind

46:50 of...

46:51 We need kind of additional stuff in order to get us to open the checkbook and send out money.

46:57 We need kind of get more stuff.

46:59 So we've added additional things into the workgroup and enterprise subscriptions.

47:03 And those are in the direction of repository management.

47:07 One of the things workgroup provides you is the ability to manage exactly what Anaconda users

47:13 internally are getting.

47:14 You know, if you download and sell Anaconda, you can point to our repositories and get

47:19 kind of this open source set of repositories.

47:21 But a lot of companies want more control over that.

47:22 And so Anaconda workgroup gives you a chance to have your own private mirrored repository.

47:28 You can control what goes there.

47:29 And you can also, you know, you can build packages and upload them to there and manage kind of the

47:34 deployment of Python and applications built around Python throughout your organization.

47:38 So it's kind of a repository server.

47:40 Yeah, there's certain places where that matters a lot.

47:43 Right.

47:44 Right.

47:44 Either it matters a lot or it doesn't matter at all, I think.

47:48 Correct.

47:49 That's exactly right.

47:49 Exactly right.

47:50 You know, like I did some work.

47:51 Exactly right.

47:52 I did some work with the guys at NASA.

47:54 And I was showing them all sorts of stuff like, okay, here's what you do.

47:58 And you've got to install this and this and this.

48:00 And you just, you know, they're like, whoa, wait a minute.

48:02 We can't just, you know, download that stuff and install it here.

48:06 Like there's rules and there's restrictions.

48:07 There's rules.

48:08 It's got to be approved.

48:09 And, you know, so.

48:10 Right.

48:10 Exactly.

48:10 To address that problem, right?

48:12 Yeah, exactly.

48:13 So Anaconda Workgroup becomes a single point of, you know, we know it's approved here and then we can control it.

48:18 And it kind of helps the IT organization understand it.

48:20 And, you know, it's growing things.

48:22 So it manages work.

48:24 It manages environments and packages and notebooks.

48:26 You can actually see what's deployed for somebody with Anaconda Workgroup by going to Anaconda Cloud.

48:31 So Anaconda Cloud is a, it's an anaconda.org.

48:34 It's a URL.

48:34 Anaconda Cloud story is kind of what gets installed behind your firewall if you get anaconda workgroup.

48:41 Anaconda Cloud is really about people who just gives everybody the ability to have, I want to publish my content packages and have other people see them.

48:50 And it gives you an easy way to do that.

48:51 So you can say, hey, here's my, here's my environment.

48:54 It's got these package dependencies.

48:55 And I want to be able to point somebody to it so they can get it exactly.

48:59 Not just they have to rebuild it and then hopefully get the same environment I had.

49:02 But here's the binary packages that actually work for me.

49:06 You can put those in a, either individually and also we have an environment specification, an environment concept.

49:14 And you can publish that and somebody can just point there to that, to Anaconda Cloud and then they can, they can get exactly the reproducible result they're looking for.

49:23 It can be a big problem in science.

49:25 When I was talking to the guys at the LHC, like it was a really big deal.

49:30 Yes, exactly.

49:31 The reproducibility, making sure you have exactly the same version of everything.

49:35 And if you're working in science, reproducibility is kind of key.

49:38 It is key.

49:40 Exactly.

49:40 Science, but also businesses have the same problem.

49:43 They want to, a lot of time is spent just rehab.

49:46 Like, oh, I had this bug.

49:47 Well, okay.

49:47 And what's your version of the, what's software and what's your data environment?

49:50 Like just getting that reproduced to where you can actually figure out what's going on.

49:54 People spend all kinds of money and time doing that.

49:57 And this, basically, we have a part, you know, our technology helps in that solution to really streamline that story.

50:04 I would say it's not as well appreciated, I think, as other, we have not had the resources to market quite as big as other people have.

50:11 And we're trying to change that.

50:12 But the ability to take people who use it and really find out, oh, content environments, this is really nice.

50:19 I can do this very quickly.

50:21 Because, you know, the rest of the IT world is pushing other stories.

50:23 And it's not those stories aren't good.

50:25 They're just sort of overkill sometimes.

50:27 And oftentimes a simple content environment will give you that reproducible environment very easily, very quickly.

50:32 Right.

50:33 Like one of the alternatives might be, hey, go use Docker and get this series of images.

50:37 Correct.

50:38 I think people, and that's great.

50:40 Docker has, it's great.

50:42 I think it's overkill for some of these cases.

50:43 And this is sort of lightweight.

50:46 Again, not everybody's a DevOps engineer.

50:49 Correct.

50:50 Correct.

50:51 Also.

50:51 Exactly.

50:53 So we're working closely with the Jupyter community and others to try to, you know, I think there's a really, it's getting to the point where people can quickly do this.

51:01 And, you know, we're constantly making improvements to the UI and how easy it is to do and make sure it's, you know, make sure that we don't require, you know, turning into another DevOps person.

51:10 I mean, there's some of that that's probably a little harder than we would like today.

51:13 But some of it's really easy and people are able to do it successfully.

51:16 So that's kind of Anaconda Cloud.

51:18 And the work group provides that capability on premise.

51:21 So you don't, you know, you can own it, control it on your servers, and you can decide how your people use it.

51:26 So for testing and reproducibility, there's also a component that will, we manage a build.

51:30 There's kind of a build queue so that you can actually submit jobs and get kind of continuous integration of your packages and have things up and running all the time.

51:37 So that's an aspect of the product too.

51:40 So things like that that we're selling on that side.

51:45 And we also, for enterprise, that next level is where we offer the kind of enhanced Jupyter experience.

51:52 You know, really about integration with your single sign-on capability, LDAP, PKI, Kerberos.

52:00 People have enterprise single sign-on they've got to integrate with.

52:03 And we've taken the Jupyter and Jupyter Hub capabilities and we enhanced those to integrate with people's enterprise stories.

52:11 Right.

52:12 So like if they're logged in on their Windows machine on their Active Directory and...

52:17 Exactly.

52:18 Active Directory interface.

52:19 You can have like a private Jupyter notebook running on the web that only they can get to.

52:24 I see.

52:24 That's exactly right.

52:25 Exactly right.

52:26 So that's what the collaborative notebook capability.

52:28 And we've added a couple of little things.

52:30 We're working right now.

52:32 We've got a great project going on in conjunction with Bloomberg and the Jupyter team to actually read kind of the next generation of the Jupyter notebook.

52:40 There's a Phosphor project and we're working closely with kind of how do we improve that.

52:46 A lot of stuff.

52:47 Again, we love to contribute to the open source.

52:49 We love to make the foundations even better.

52:52 And then typically we sell things that really help the enterprise and tie into their deployment conversation.

52:59 It seems to me like running a successful open source business is about having like different channels for different ways, different things you can offer to different people.

53:10 There's a guy that...

53:13 Very insightful.

53:13 Yes, indeed.

53:14 Yeah, thanks.

53:15 A guy that blogs and podcasts a lot about sort of tech business stuff.

53:21 And he has a really interesting saying that it's harder to go from zero to one cent than one cent to ten dollars.

53:29 Do you feel like it's kind of that...

53:30 Yeah.

53:31 Like get somebody to open up their pocketbook at all is like really a big step and then you kind of have this relationship?

53:37 Yes, agreed.

53:39 Agreed.

53:40 It's a big, big jump to go from not paying you anything to paying you something.

53:46 Yeah, even if there's something that's super small, right?

53:48 Exactly.

53:49 And, you know, when we first started the company, that's actually why we had these add-ons.

53:53 And we've just switched our promotional model, like how we're shipping, what we're selling on the product side.

53:59 So we're in the middle of that transition still a little bit with some customers where we used to sell these add-ons.

54:04 And part of that, the reason we did that was precisely to just have a conversation about selling something with those who would engage in that direction.

54:14 Because I knew we would be promoting open source.

54:16 I knew we would be driving a lot of stuff for free.

54:18 And I was excited about that.

54:20 But I knew we also needed to have the I'm going to sell you something conversation so we could segment the market appropriately.

54:25 Yeah.

54:26 And then kind of set expectations that this is what we look to do is.

54:29 We do both.

54:30 And so people can understand if they only want to engage with us and our free stuff, they can do that and understand that we're not – it's not a bait and switch.

54:38 It's not a – we're certainly trying to encourage adoption, and hopefully some of those might buy our stuff later.

54:44 But if you've decided just use the free stuff, yeah, you use the free stuff.

54:48 Then let's collaborate on open source.

54:50 Let's collaborate in the community.

54:51 Let's collaborate around how we move this thing forward together.

54:54 Yeah, absolutely.

54:55 That's great.

54:56 So, Travis, we're getting kind of short on our time here.

55:00 Do you have any call to actions for the listeners out there?

55:03 Yeah, you bet.

55:04 Absolutely.

55:04 I would say definitely go and download Anaconda.

55:07 If you haven't tried it, I think if you're a new user, absolutely download Anaconda.

55:12 But if you're an old Python user, you might find that we've actually solved some things that you're having trouble with.

55:16 And you might try it.

55:18 It doesn't interfere with your Python installation.

55:19 It's a new – you can install it in your user account and kind of use it separately.

55:23 A lot of people used to tell me they were afraid to install anything new because it would mess up their work.

55:30 Anaconda doesn't mess anything up.

55:31 It's just completely separate.

55:34 Yep.

55:34 And you've got a Python 2 and a Python 3 version, right?

55:37 We have Python 2, Python 3 version.

55:39 And then, you know, see if you want to sign up for an Anaconda Cloud account.

55:42 It's a free account, and you can post packages there.

55:44 And then if you're an academic, you can get access to our proprietary libraries through that Anaconda Cloud account as well.

55:53 And, you know, set up an Anaconda community somewhere, you know, locally and attend the PyData conference.

55:57 There's a PyData conference in New York coming.

55:59 Attend one.

56:01 And if it's not one that's coming near you, there are 10 next year coming near you, then set up an Anaconda community and participate.

56:10 Yeah, that's excellent.

56:11 Are you guys going to be at PyCon?

56:14 Yes.

56:15 Yes.

56:16 We go to PyCon.

56:17 We'll be there again.

56:18 I'm quite sure.

56:20 In Portland this year.

56:21 Right?

56:22 Yeah.

56:22 Excellent.

56:22 That's my hometown.

56:23 I already got my ticket.

56:24 So it's going to be great.

56:25 Hey.

56:25 All right.

56:25 We'll see you there again.

56:26 Yeah.

56:26 I hope so.

56:27 Cool.

56:28 And then two other questions I always ask my guests before I let you go.

56:31 If you're going to go write some Python code, what editor do you open up?

56:35 So it's either VI or Sublime Text these days.

56:42 Okay.

56:42 Yeah, they're both good.

56:43 Good, good.

56:44 And of all the PyPI packages out there in the world, 60 plus thousand of them, do you have some favorites or maybe things that people don't know about that you're like, oh, I wish you knew about this.

56:56 I should tell everyone.

56:57 Oh, so many.

57:00 I will tell you about Distributed.

57:03 Distributed is one that we are working on.

57:07 Matt Rockland, who many may know about because he's really cool and does a lot of cool stuff.

57:13 Distributed is a way to do parallel computing in a modern Pythonic approach.

57:22 And it's going to be the way Dask runs on multiple machines.

57:29 Okay.

57:30 So it's a separate library and it'll be the foundation for how Dask.

57:33 There's two, Distributed and Dask.

57:35 I think those are worth your attention.

57:37 And a lot of people already know about them.

57:39 But those who don't, they're worth looking at because they're very, they're going to basically give to Python a Spark equivalent.

57:46 Okay.

57:46 That's awesome.

57:47 Without Spark.

57:49 For at least for the medium scale, you know, tens of nodes, tens and twenties of nodes.

57:55 All right, Travis, it's been great.

57:56 Thanks for being on the show.

57:57 I really appreciate it.

57:58 I really appreciate it, Michael.

57:59 Thank you so much.

58:00 And good luck.

58:01 And good luck with future shows.

58:02 Yeah.

58:02 Thanks a lot.

58:03 Bye-bye.

58:04 This has been another episode of Talk Python to Me.

58:07 Today's guest was Travis Oliphant.

58:09 And this episode has been sponsored by Hired and DigitalOcean.

58:12 Thank you guys for supporting the show.

58:14 Hired wants to help you find your next big thing.

58:17 Visit Hired.com slash Talk Python To Me to get five or more offers with salary and equity presented right up front and a special listener signing bonus of $4,000.

58:25 DigitalOcean is amazing hosting blended with simplicity and crazy affordability.

58:32 Create an account and within 60 seconds, you can have a Linux server with a 20 gig SSD at your command.

58:38 Seriously, I do this all the time.

58:40 And don't forget the discount code.

58:42 It's Talk Python.

58:43 All caps.

58:43 No spaces.

58:44 Did you know you can personally support the show as well?

58:48 Just visit Patreon.com slash mkennedy and join over 100 listeners who contribute between $1 to $2 per episode.

58:54 It makes a big difference and I really appreciate it.

58:57 You can find the links from today's show at talkpython.fm/episodes slash show slash 34.

59:04 And be sure to subscribe to the show.

59:06 Open your favorite podcatcher and search for Python.

59:08 We should be right at the top.

59:10 You can find the iTunes and direct RSS feeds in the footer of the website.

59:14 Our theme music is Developers, Developers, Developers by Corey Smith, who goes by Smix.

59:19 And you can hear the entire song on talkpython.fm.

59:21 This is your host, Michael Kennedy.

59:24 Thank you so much for listening today.

59:26 Smix, take us out of here.

59:29 Staying with my voice.

59:30 There's no norm that I can feel within.

59:32 Haven't been sleeping.

59:33 I've been using lots of rest.

59:35 I'll pass the mic back to who rocked it best.

59:38 I'll pass the mic back to who rocked it best.

59:50 Thank you.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon