Learn Python with Talk Python's 270 hours of courses

#198: Catching up with the Anaconda distribution Transcript

Recorded on Wednesday, Jan 16, 2019.

00:00 It's time to catch up with the Anaconda crew and see what's new in the Anaconda distribution.

00:04 This edition of Python was created to solve some of the stickier problems around deployment,

00:08 especially in the data science space. Their usage gives them deep insight into how Python is being

00:13 used in the enterprise space as well. And that turns out to be a very interesting part of the

00:17 conversation. Join me and Peter Wang, CTO at Anaconda Inc., on this episode of Talk Python

00:22 to Me, number 198, recorded January 16th, 2019. Welcome to Talk Python to Me, a weekly podcast

00:42 on Python, the language, the libraries, the ecosystem, and the personalities. This is your

00:47 host, Michael Kennedy. Follow me on Twitter, where I'm @mkennedy. Keep up with the show

00:51 and listen to past episodes at talkpython.fm, and follow the show on Twitter via at Talk Python.

00:56 This episode is sponsored by Linode and Rollbar. Please check out what they're offering during

01:02 their segments. It really helps support the show. Peter, welcome to Talk Python.

01:06 Thank you very much. I'm very happy to be here.

01:07 I'm happy to have you here. It's been a while since we've talked about Anaconda. I had Travis

01:12 Oliphant on the show way back when, but it seems like it's time for a catch up on what you all

01:17 have been up to. Yeah, well, there's been a lot going on. It's definitely, one of the employees

01:21 that's commented that every six months, it feels like a different company. And we do,

01:25 yeah, the space is evolving very quickly. We're trying to just keep up with it.

01:28 So you would say this data science thing is not a fad. It's probably going to be around

01:31 for a while?

01:31 At this point, I think I'm going to go on a limb and say it's probably going to be around

01:34 for a little while.

01:35 Right on. All right, before we get into all that though, let's start with your story.

01:38 How did you get into programming in Python?

01:40 I actually got into programming when I was a young kid and I've been always programming.

01:44 I've actually been programming for almost as long as I've been speaking English.

01:46 I got a PC when I first came here to the United States, so I was very lucky.

01:50 But I actually majored in physics and out of college, I started going to computer programming

01:56 as a profession. And I did a bunch of C++, but I discovered this thing called Python on Slashdot.

02:03 And I think they announced the version 152. And I was like, fine, I'll go take a look at it.

02:08 And I started playing with it and I just fell in love. And so my day job was like getting beat

02:14 up by C++ templates and out of compliance compilers. And at night, I just hack on Python.

02:19 So finally, after a few years of this, I ended up moving to Austin. I got a job doing Python

02:23 as my day job, which was awesome. In like 2004, I started at Enthought. And I did a lot of work

02:29 in the scientific community and doing consulting with Python because I knew the science given my

02:33 math and science background in physics. But I also knew the software principles and software

02:37 engineering. So it was a really fantastic time. And that's basically the long and short of it.

02:40 Yeah, that sounds like a great fit. You know, things just came together, right? You have this math and science

02:44 background and you love Python. You found this job and it all just, like all of those things came together to really put you in the right place.

02:51 They really did. I feel very, very blessed in that way. Now, it was a lot of hard work too,

02:55 but I got very comfortable. And, you know, there's this great quote from Bruce Lee that you must never,

03:00 like not, you must never get comfortable, but there will be plateaus and you can't stay there.

03:04 And so I think towards the end of the 20, the aughts, the 2000s, around 2010, I was starting to see big

03:11 data happening. And I started realizing that Python was getting used for business data analysis more

03:16 than just science and engineering. And that our little cozy scipy community could actually be

03:20 something much bigger. And so I started doing some exploration, exploratory work. I really wanted to

03:26 do like D3 for Python. You had a few of the little things I wanted to scratch, some few other itches.

03:30 And so I started Continuum with Travis in order to address some of the technical gaps that we had in

03:36 the community and the technology stack. And then also to really push a narrative in the technology

03:41 market that yes, Python is good for business use. Yes, it's production ready. Yes, you should use it.

03:47 And it can handle big data just fine. And so we really started pushing that narrative in 2012,

03:53 you know, created num focus, created py data, did all these things. And I think that the results have

03:57 spoken for themselves. I definitely think they are that they have. That's great. In 2012, I do think

04:03 there was a little bit more of a debate of, well, is it safe to use Python for our business critical

04:07 stuff? But I feel like that battle has been really solidly won, especially on the data science front,

04:15 right? There was debates about R, maybe R was the space to be. That's not really where it's at anymore,

04:20 is it?

04:20 No, there was definitely a period of language war sort of stuff going on early on. It's odd, like, you know,

04:25 even then, the discussion about is data science a fad? Is it a fad term? Isn't it just business

04:31 intelligence? Or is this just that big data hype cycle all over again? You know, there's a lot of

04:36 doubters and haters on that term. But as I've talked to more users and managers and stuff,

04:42 at businesses, it's clear that they're thinking about data analysis and data analytics in a very

04:47 different way than they have for like decades. And data science is definitely, definitely here to stay

04:51 because of that.

04:52 Absolutely, absolutely. So maybe give people a sense of what you do day to day so they know where you're

04:57 coming from.

04:58 Well, my day to day consists of my former role as CTO. I run the community innovation and open source

05:05 group here at Anaconda. I actually don't run the product engineering teams. And I work with

05:10 everyone. But my general role is working with the community, helping the various community oriented

05:15 and open source devs that we have champion their projects and work better with the broader community.

05:20 I also do a lot of industry facing technical marketing and evangelism. So a lot of customers

05:25 will have me go and speak at internal data science events they do, things like that. There's actually

05:29 remarkably few people in the Python world that really speak to industry on behalf of Python itself,

05:34 relative to the usage of it. I mean, you'll find no shortage of industry analysts talking about how

05:39 great Java is, or how great these like big data projects are, you know, all these like PR type

05:44 things. There's no one doing that for Python. And so that is actually some of my day job. And beyond

05:49 that, it's just trying to keep up with all the things that are happening in data science, machine

05:53 learning, data engineering, data visualization, AI, all of it.

05:58 On top of the advocacy role, it's a pretty much full time learning thing, right? Because there's so

06:04 much change, right?

06:05 There's so much in every area. I mean, there's all the cloud stuff too. There's edge learning,

06:09 there's data privacy, you name it. Every single area that touches data science is undergoing massive

06:13 change right now.

06:14 That's super exciting, but it's also a bit of a challenge. And I think the Anaconda distribution

06:18 does help some with that. Before we get into the distribution story, though, let's just talk about

06:24 Anaconda Inc. So when I had Travis on the show a couple years ago, it was Continuum that was the

06:32 company and Anaconda was the distribution. But now those are not different anymore, right? It's just

06:37 Anaconda, the company and the distribution.

06:40 We renamed ourselves really out of pragmatism, because we would go to places and we'd introduce

06:47 ourselves as Continuum Analytics. And they're like, oh, yes, you guys, like you got some Python stuff.

06:51 We see that here. Like, who are you guys? And then we say, oh, well, we make Anaconda. And they're like, oh,

06:56 I love Anaconda. I use Anaconda all the time and blah, blah, blah. And so we sort of like, after that started

07:01 happening to us all the time, we sort of figured like, well, maybe we should just call ourselves Anaconda.

07:06 And, you know, one of the things that held that up was for a long time, as we were growing the company and

07:12 growing the distribution, we were afraid that changing the company name would actually spook the community.

07:18 And it's a really, it's been one of these interesting things. Like I have, I have lots to say

07:22 about open source. Let's just put it that way. But it's very hard to play the game of open source,

07:27 honestly, and not still get beat up with FUD about it. And so even though we've open sourced our build

07:32 tools, we've open sourced the recipes, we open source everything from the very beginning,

07:35 there are still people in the community who distrust us because we're a company trying to make a

07:40 sustainable, build sustainable funding for this open source effort. So it's a really,

07:45 that was one of the reasons we actually were reticent to do that name change until finally

07:49 just became a no brainer that we basically had to.

07:51 Yeah. If people keep mistaking you for Anaconda Inc, maybe just say, fine, that is our name.

07:56 Yeah. And we'll just deal with the haters, you know, on a one-off basis, I guess. I don't know.

08:00 Yeah, exactly. I mean, it's not unprecedented, right? 37 Signals, who made Basecamp and,

08:06 you know, sort of founded Ruby on Rails, they eventually renamed themselves just to Basecamp.

08:11 They're like, yep, the one major project, fine, we're just called that, right? I guess it's like

08:15 Microsoft reading themselves Windows, which they're probably very happy they didn't. But,

08:19 you know, in a lot of senses, that makes sense. That's cool. Okay, so there's a broad spectrum

08:25 of folks who listen to the show. Many of them will have experience with data science. Many of them

08:31 will know what the Anaconda distribution is. But maybe just, you know, for the folks who are new or

08:36 have been working somewhere else, tell them, what is this distribution? How is it different

08:40 than the standard CPython? And why did you guys make it?

08:43 I'll try to sum this up for a technical, but not data science necessarily audience, right? The basic

08:49 gist of it is that Anaconda arose out of a failure in the Python ecosystem to address the packaging needs

08:56 for the numerical and computationally like heavyweight packages that are in Python. And so for the same

09:03 reason that Linux distributions exist, very few people build Linux from scratch. For actually

09:08 exactly the same technical reasons, we built the Anaconda distribution, because it's actually really,

09:13 really hard to correctly build all of the underlying components that you need for doing productive data

09:18 science and machine learning. And so the reason it's distribution is because all of the libraries you

09:25 build and the packages, the modules with extension modules that you load up, they need to be compiled

09:30 together, they need to be compiled in a compatible way. And so you need to agree on compiler definitions,

09:35 you need to agree on code generation targets, optimization levels, things like that. And if you

09:41 only ever use pure Python packages, so packages whose code only consists of PY files, then you basically

09:49 never run into a problem. It's only when you start having extension libraries, things that depend on maybe

09:54 system libraries, God forbid you try to cross platforms between Linux and Mac and Windows across

09:59 architectures between ARM and x86, you're completely hosed. And so we, in service to the scientific Python

10:06 community, we built this distribution that was a set of packages and a way of building packages that are

10:11 compatible with each other. So that's what the Anaconda distribution is. It's a bulk distribution with about a

10:16 couple hundred pre made libraries. And we have a package updater in it called Conda that lets you

10:23 then install thousands more that are built by us and built by a large open community that also uses the

10:29 same standards. So that's what Conda and Anaconda are in a nutshell. And it's really one of these like

10:35 packaging war kind of things or packaging, the confusion of Python packaging. We actually tried to approach

10:42 Guido back in the day to help define some standards around this. And he basically gave us a very helpful

10:49 guidance, which is maybe your packaging needs are so exotic, you need to build your own system. So we took

10:53 him at his word and we did it. And consequently, when people use Conda, in a lot of cases, things just work.

10:59 There's still like corner cases and a lot of like little rough spots, especially in terms of pip interop.

11:04 But we're very proud of the work we've done so far. And it's used in production every day by big,

11:08 big companies that people rely on Python for their production workloads. So that's basically Anaconda

11:13 and Conda in a nutshell.

11:14 Okay, well, that's a really good summary. Yeah, when I think of it, the main value is that you get

11:20 pre compiled binary versions of the packages that would otherwise have to be compiled from source when you

11:27 pip install them, right?

11:29 Yes.

11:29 And the other part is the cross package compatibility, because somebody makes one package, and they have an

11:38 interest in making them as best they can or whatever, but they don't really care about integrating and testing

11:43 against all the other open source projects that you may pull into your project that they don't even care or know about,

11:49 right? So this sort of bigger picture compatibility that you look at is pretty cool as well.

11:55 It's actually become quite critical. And I think this is one of the areas that the Python community,

11:58 in the confounding haze of packaging, and half built packaging solutions, that we've not really

12:05 been good at giving guidance to the user community about is that if all you ever need to do is build

12:10 one package for yourself, and you fully control the deployment environment, and the development

12:14 environment, then maybe you can go and do that, right? But if you actually have to work on a team

12:19 with other people, like for example, on web developers, a lot of times, they control the

12:23 server, they choose the packages they bring, and they write the code, and they can just push it out

12:28 to their server. And they're good, right?

12:30 Yeah, and they're good to go. And they can you can do any number of things that you want to, you know,

12:33 what I would what I would liken it to is if you ever do, if you build your own wheel, if you build your

12:38 own native extensions, it's like getting plastic powder or plastic pellets, and making your own mold

12:44 mold of Legos or Lego like things and pouring your own little pieces. And so as long as you're the one

12:49 that controls what they have to plug into, and you're the one that controls all the molds, then you

12:53 don't need any standard definitions of studs or holes or lengths or anything like that, you're good to go.

12:58 But if you ever want to work with other people who have their own molds and their own places and

13:03 studs, they want to put these things on, you've got to come up with a standard definition. And so what

13:08 Anaconda is essentially, it's like a Lego system, we've standardized what the studs are and what the

13:13 holes are. So lots of people can build different kinds of Legos, and they all can plug together.

13:16 And that's kind of the long and the short of it.

13:18 Yeah, very interesting. So some other things that are in play there are you talked about Conda and

13:25 installing the packages that you built, right, the couple hundred or whatever that come with the

13:29 distribution. But then you also said installing the others through this thing called Conda Forge.

13:35 What's Conda Forge?

13:35 Well, Conda Forge is a community of people who I would say out of a masochistic charity to the

13:42 community. They take on the job of maintaining build scripts and recipes that take upstream

13:48 software and make it so it's actually buildable in a reproducible way and that it works with other

13:54 things. So it's a community of package builders and they have several hundred contributors and

13:59 they've built thousands of packages. We ourselves build about a thousand, although only 200 are built

14:04 into the big Anaconda installer download. But the Conda Forge community goes even beyond that and

14:09 builds several thousand. And that's what Conda Forge is.

14:11 Yeah. Interesting. So people are like, you know, it's really painful to build this package,

14:16 but only one of us should ever suffer and feel that once. And we'll do that on behalf of the

14:21 community. I'll take that on for this one package.

14:23 Yeah, basically. I mean, you know, the real challenge is it's one of those things in life

14:27 where it's almost worse that it's easy to do a bad job. I don't know that we have a term for this

14:32 in English. Maybe there's a long German word for it. But it's like the same thing with the coding

14:36 principles of like, if something is broken, you want it to break loudly and fail loudly,

14:40 right? You don't want it to make a half effort. Sometimes it kind of works sometimes. And so with,

14:45 but building package is the same thing. Most people can kind of get a build working for most things,

14:51 but does it work well? Will they ever be able to do it again? Like it doesn't work with anything else.

14:57 None of those things, you know, it takes a lot of work to make a good package build. So,

15:01 well, that speaks to the reproducibility side of things. And I know in data science and

15:06 scientists using data science tools, that reproducibility is a super important aspect.

15:11 And I guess the first step is I can run the software, which means I can build the packages

15:16 and install them.

15:17 Right. And that is really what we think that providing pre-built binaries and then having

15:22 good provenance of the build system itself. That's really some of the only ways you can really

15:27 honestly, like not kidding yourself, have reproducibility. I think some people think

15:32 that Docker somehow saves them, but it really doesn't. So it's kind of a struggle right now,

15:38 honestly, because there's so many moving pieces. There's a lot of confusion in that space, but I do.

15:42 Yes, I do agree with you that Conda packages used properly can absolutely be a great way to ensure

15:47 reproducibility for data science.

15:49 Yeah. Well, it's probably better than saying, well, if you want to install this package,

15:54 you're going to need to have the Visual Studio 2008 compiler set up correctly on your machine

15:59 in 2025 or whatever, right? When it's no longer compatible with the Windows or who knows what,

16:05 right?

16:05 Yeah. We're going to have to, like, one of the reasons I think that our team,

16:08 the Conda and Anaconda team are happy to move away from Python 2 is because the dependency on that

16:13 compiler. Someday when we finally put Python 2 to rest, I'm probably going to try to eBay a bunch of,

16:19 like, boxes of those CDs just so they can break them out of, you know, sort of like a cleansing

16:24 bonfire or something. I don't know. Maybe you shouldn't burn CDs. That's bad, actually.

16:28 Yeah, but you could have some sort of ceremony with them for sure.

16:32 Yeah.

16:33 I think the new Python 3.7, it uses MSBuild. Is that right?

16:38 You know, I'm not sure on the details of that, but I think that there have been significant

16:42 improvements. And, you know, the Python folks who work at Microsoft have worked really hard

16:49 to improve the compiler situation there for Python. I think it's much better now with Python 3 and in

16:53 the later releases of Windows. It's just we have, you know, very old Python, very old Windows that

16:59 still are deployed that we have to keep those users going. So that's where almost all the pain is.

17:04 I can imagine. Yeah, I just had Steve Dauer from Microsoft on the show, and he's in charge of the

17:08 installer and stuff there. And he's doing some really, really cool stuff to make it more accessible

17:12 on Windows. And it's easy to go to conferences and forget how important Windows actually is,

17:19 right? You look around, it looks like everyone has a Mac. There's a few people running Linux.

17:23 That's pretty much what you see at the conferences, right? But that's not what the actual consumption

17:28 out in the world is, is it?

17:30 No, that's not at all reflective of the of even the United States. And then you go to the broader

17:35 world. It's a lot of Windows. It's a lot of Windows, a lot of Linux, too. But yeah, I think this is one of

17:42 the structural problems that faces the open source community is that when you're small, it's easy to do

17:47 product management, because it's like you and your buddies. But once you get bigger, you have to actually

17:52 intentionally go and try to pull in information from your users. And I think that's the Python, that's

17:57 actually, I think, a structural challenge for the Python community at this point in time.

18:00 When we're talking about Conda Forge and things like that, something I had not heard of before,

18:05 but I saw that you're running is something called BioConda. Now, it sounds like it might have to do with

18:11 biology and data science around biology, but that's all I can discern from it. Tell us about that.

18:16 That's new to me.

18:17 So BioConda is actually not one of our projects. And oh, I should have said this earlier with Conda Forge.

18:22 BioConda, Conda Forge, and various other sort of groups, they use our Anaconda Cloud package hosting

18:29 infrastructure to support their community. Because with the Conda package installer, it's easy to give

18:35 it a namespace flag, basically a channel name, and then it will go and download packages only from that

18:40 channel on Anaconda Cloud. So these represent, Conda Forge and BioConda represent different communities

18:46 that are using the Conda packaging tool, but they may have set slightly different standards or included

18:50 certain other standards in their build system protocols and standards. So all these packages

18:55 work together. So yes, BioConda is for the biology, genomics sort of community.

19:00 Yeah.

19:01 They have very specialized, well, specialized is maybe a euphemism, but there's a lot of specialized

19:05 software needs in the biology community. It's very R-centric. There's a lot of, depending on what

19:11 you're doing in that domain, there's a lot of PERL sometimes.

19:13 So...

19:14 Yeah, interesting. We'll leave that there.

19:17 Are there other ones? Is there like a ChemConda or things like that?

19:20 No. So there's actually... Yeah. So I think Bio... I'm going to kick myself later,

19:24 probably, as I forget some. But there are major research disciplines and communities that do use

19:29 Conda quite a bit. So I think the astronomy research community has taken on Python and embraced Python

19:34 a lot. They use Conda as a way to get nightly builds and dev builds and just really get easy

19:39 deployments, right, of their complex software. One of the things that Conda does well,

19:43 I should have said this earlier, it's not just a Python packaging tool. It's a sort of a userland

19:48 software packaging tool. So we package up R, Perl, Python, C, C++, Fortran, Java, Scala,

19:54 Ruby, Node, you name it. We really are almost like a portable userland RPM kind of thing.

20:01 And so that allows for these communities that have a lot of scientific engineering code written in not

20:07 Python, sometimes not even C or C++. We can package all those things up together, move

20:12 these collections of packages around.

20:13 Yeah, that's pretty interesting. That takes the challenge of packaging and sort of

20:18 magnifies it extremely, right? Multiplies it combinatorially.

20:22 Oh, yeah. Oh, yeah. It definitely gets pretty complex.

20:28 This portion of Talk Python to me is brought to you by Linode. Are you looking for hosting that's fast,

20:33 simple, and incredibly affordable? Well, look past that bookstore and check out Linode at

20:38 talkpython.fm/Linode. That's L-I-N-O-D-E. Plans start at just $5 a month for a dedicated server

20:45 with a gig of RAM. They have 10 data centers across the globe. So no matter where you are or where your

20:50 users are, there's a data center for you. Whether you want to run a Python web app, host a private Git server,

20:55 or just a file server, you'll get native SSDs on all the machines, a newly upgraded 200 gigabit network,

21:02 24-7 friendly support, even on holidays, and a seven-day money-back guarantee. Need a little help

21:07 with your infrastructure? They even offer professional services to help you with architecture, migrations,

21:12 and more. Do you want a dedicated server for free for the next four months? Just visit talkpython.fm

21:18 slash Linode. So another thing that looks like it's doing really well is Anaconda Cloud. And so

21:26 this is a place where like data scientists can share their work and their packages and things like that.

21:31 Is that right? Yes. So right now, Anaconda Cloud is primarily, I think, used as a package hosting

21:35 environment. And a lot of developers in the data science ecosystem use it as a way to publish

21:40 nightlies or dev builds. Many of the projects, the key projects, they give us a heads up when they're

21:45 about to cut a new release so that they can push, make sure that they can announce the Conda package

21:50 at the same time they announce the release of the, you know, cutting new version of the software. So

21:54 it's very nice of them. Yeah. So how's that work alongside as well as moving differently than just

22:00 putting on PyPI? It gets pretty complex. So number one, there's channel support. So we basically have

22:07 individual developers can have their own channel and those packages, you know, their users can just

22:11 download packages from just that channel and not sort of a single global namespace, right?

22:17 Another really important thing is that there's not just one build. So Conda as a packaging system

22:22 has much deeper and richer metadata about the build environment and what it expects of the runtime

22:28 environment. So I can build a package that the same upstream software, I can build different versions

22:33 that are optimized for different levels of your hardware, like whether or not you have GPUs,

22:37 whether or not you want, you have an advanced Intel chip or a relatively basic chip, I can push all of

22:42 that stuff in. And maybe using this version of a compiler or that version of a compiler, like Clang

22:48 versus GNU GCC, you know, these things actually make material difference in whether or not the package

22:53 will work. That level of resolution and that ability to feature flag and select is not available on PyPI

22:59 as far as I'm aware. And again, it's just, you know, even if one package is available, if you use

23:04 pip to install PyPI, pip aggressively goes and tries to build other things from source, right? And if it

23:09 doesn't, it sort of has a very, it doesn't do an a priori solve what you need, it sort of grabs things

23:14 as they go. And so you can end up with very much the incorrect packages coming down, you can end up

23:19 trying to build something from source that maybe build successfully. But again, that's not what you

23:22 wanted. You want the pre build, right?

23:24 Right, with different settings, different compiler. Okay, that's the primary difference.

23:27 It is frustrating periodically that you can say, here's a bunch of things I need to install

23:32 on pip, you know, pip install these things. And one of them will have a requirement that the version

23:38 of one part is no larger than such and such. And yet it'll go grab, you know, depending on the order

23:44 once you specify it, it may grab the wrong one, you know, and just install that. And then the other

23:50 package is incompatible. Like there's weird little cases like that you can get into all the time,

23:55 right? Because it's actually, this is one of those areas of software development that for most people,

24:00 it's not a fun and sexy area to think about. But it's a deeply critical thing. When we rely on open

24:05 source software is to actually understand what does the dependency matrix look like. And there's no free

24:10 lunch, you know, if you do it in kind of this relatively naive way, like what pip does, then you

24:15 can easily end up in a corner, and things are incompatible. If you try to do it, what we do,

24:19 which is have very explicit and curated metadata about versions, and you do an a priori solve,

24:24 well, people complain the solve takes a long time, which it can. So there's really no free lunch on

24:30 that. I think one of the challenges that we actually have is that the metadata itself can be wrong. And

24:37 we found that all over the place. So packages think they will declare they're compatible with this

24:42 version or that version, and they're actually not. And so we have to actually patch what the upstream

24:46 declarations are. So again, it gets subtle and detailed. There's just a lot of like muck in this

24:51 area that we have to deal with. Yeah, it sounds a little bit like, these are the problems that you

24:56 can address and then learn about. If your job is to coordinate a whole bunch of packages that don't

25:02 interact intentionally with each other, right? They just want to make their project,

25:06 something that you can ship and install and use. And that's fine, right? But at this,

25:12 this interaction across them is where it gets tricky.

25:15 There's absolutely a tragedy of the commons. Like with the way I've, the metaphor I've used in the

25:19 past is that every developer, you know, open source maintainers, bless their hearts. They are way,

25:24 they're doing a thankless job a lot of times anyway, and they're way burned out and stressed.

25:28 But they're really solving for it. Does my vehicle work in my driveway? You know, can it get out of my

25:33 driveway and drive into my other maintainers driveway down the street? And if that works, they're good to go

25:38 a lot of times. And when everyone, one, every of the thousand developers in the ecosystem do this,

25:44 you'll end up with a bunch of cars squashing all over each other in the, in the, in the highways and

25:49 the freeways, because they're not thinking about that integration problem for their end users.

25:52 And the end users, a lot of times in data science, they're not sophisticated software developers.

25:56 They have no ability to solve this problem for themselves.

25:58 They're at the very edge of struggling to write a 10 line script, not understand the complexity of like

26:05 TensorFlow dependencies or something like that.

26:07 Exactly. Exactly.

26:09 So one thing that you all did recently, that seems to be a trend is you switch from the major minor

26:15 versioning scheme to calendar based scheme. And I think this is an interesting thing, especially

26:20 around open source, because we've had, you know, Mamuta Shemi created this site called Zerover,

26:26 sort of make fun of all the projects that have been around for 10, 15 years with, you know,

26:32 50 or a hundred releases, but are like 0.1 point 17, you know, some point, you know,

26:38 like really small versions. And it seems like one of the fixes is to say, well, let's move towards

26:44 something that has more to do with, I can look at the version and I can tell you without deeply

26:50 knowing that software, whether that's a new version, an old version, a medium aged version,

26:55 right? Like if I told you request was 2.1.4, is that new? Is that out of date? I don't know.

27:02 Right. But if you use this, this new style, it's pretty obvious. Like, what was the thinking there?

27:07 It's a community convention. It definitely makes it, it's for that user affordance that you can

27:11 sort of look at it and know. And also, you know, we set this expectation that we will release at a

27:16 regular cadence and it's for our own internal documentation and everything else. Everyone

27:19 just is able to collaborate more easily around that. But I think the zero ver thing, I mean,

27:24 I love Mamuta and I think it was a hilarious thing, you know, in a community here where we have

27:27 SciPy and iPython or, you know, Jupyter and other things, pandas, you know, zero dot, whatever,

27:33 or I guess it's not quite zero dot anymore, but like SciPy for sure. These things, there's actually

27:38 something we can laugh at all we want to, but there's a thing there that the author is trying

27:43 to say, or the maintainer is trying to say, which is, it's not quite ready yet.

27:46 You know, I'll call it 1.0 when I'm good and ready and I'm not ready yet. It might not be for 20 years.

27:53 And so, of course, that's also kind of a silly position to take with literally millions of people

27:57 and their production code depend on your software.

27:59 I think they're not saying that it's ready. I think what they're, they're thinking of to say

28:04 when it goes to 1.0 a lot of times is it's done and software is rarely done.

28:10 Well, software is done. The instance it's released, at least that version of it, right?

28:13 I think this is where we as an industry actually have to get, we have to up-level our thinking

28:18 about this. And we got to stop thinking about software as artifacts, hardballs of code that

28:25 are static. And we actually have to start thinking about this from a flow perspective, that we are

28:30 looking at flows of projects. And there's a covenant that is established in a relationship

28:36 between the user of one of these flows and the people who originate those flows.

28:41 And I think, you know, there's a really interesting thing I learned years ago about

28:45 aerodynamics. And basically that when planes move less than the speed of sound, you can reason

28:52 about aerodynamics somewhat similarly to water and water flow, right?

28:55 But once you break the sound barrier, the thing that actually causes you the greatest amount

29:00 of pressure on your airframe and things like that, you actually have to reason about the change

29:04 in cross-sectional area of the airplane as it moves through the air.

29:08 So it's almost more like streams of thick rope and you're shoving rope aside.

29:14 So you move from this particle flow way to looking at actual flows.

29:19 And so similarly with software, I think we've got to stop thinking about this as being just a code drop,

29:25 right? And maintainers as people who go and dump out a bunch of code and actually look at a relationship

29:31 with projects. And this gets to like sustainability. This gets to, you know,

29:35 versioning and what's what, what is the promise in a version number, all of that stuff. It's actually

29:40 deeply involved. I don't know that the software industry has really started to learn how to consume

29:46 like the enterprise consumers of open source. I don't know that their internal practices have

29:50 really caught up with thinking about it that way.

29:52 Yeah. And that's kind of why I was bringing up the versioning a little more deeply because

29:56 I think the folks that spend their time all day in open source, they know that Flask, even though it had

30:04 some small version number recently moved to 1.0, but it had some small version number, but it's really

30:10 used a lot and it's been around a lot. So it's fine. Right. But the corporate groups, the enterprise groups,

30:17 they see that as a flag of like, that's test software. We're not ready to like make our bank

30:24 run on test software. Is that the feeling that you got by interacting with, because you, you touch both

30:29 open source and enterprise groups more than a lot of folks, I would suspect.

30:34 Yes, absolutely. We, we are a B2B software company. That's where the bulk of our revenue comes from.

30:39 And absolutely. We suffered, we suffered mildly for that. You know, we have to basically go and

30:44 talk to procurement and compliance and it people that are swimming, you know, they're up to their

30:48 ears in software. They look at a spreadsheet. We come in with our software, our enterprise software

30:53 and say, well, you know, here's the open source things that are in the manifest. And they look at

30:57 this thing and they're like, what is this? This is a pile of garbage. It's all zero dot, whatever.

31:01 Right. And it's like, yeah, but that runs Instagram, you know, like that literally runs

31:05 Dropbox. So like, what are you complaining? You don't really want to get into that.

31:09 Once you have that argument with an IT guy, you've already lost.

31:11 Right. You're, you're a small insurance company with a hundred thousand customers.

31:15 You're not running, you know, YouTube with a million requests per second. That's using similar

31:21 software, right? It's, but it's the mentality, right?

31:23 Yeah. And you know, a lot of, a lot of going into any kind of, I would say that over the last,

31:29 you know, five or six years, I've had to do a lot of adulting. And one of the parts of adulting

31:33 up from just being a geek, like, you know, code nerd kind of guy to being able to actually have

31:38 customer conversations is actually having quite a bit of empathy for the customer.

31:41 Right. And from their perspective, yeah, they are just a regional bank with a few hundred

31:45 thousand customers. They don't have the budget of alphabet to write to throw at a SRE team

31:51 and a whole dev team and all that stuff. So their approaches to understanding risk and risk

31:55 mitigation from the thousands of vendors that want to sell them software. Maybe it's the most

31:59 practical, you know, I'm not, again, I'm not defending it, but I'm just saying one could come

32:03 to a point of empathy, right? With their approach.

32:04 That's a really good point. I do totally agree. It is exactly because they're small,

32:09 they can't hire the fresh new hottest software engineers that would rather be in Silicon Valley

32:15 or Austin or, you know, Portland or wherever, right? Like they just don't even have the ability

32:21 to determine whether or not what you're saying is true in a lot of, a lot of cases, right? It's like,

32:26 they just, you know, exactly.

32:28 We just rather use Microsoft. We know that they give us this SLA and this agreement and

32:32 we're just good, right? There's one way to make websites, use ASP.net. We're good. Just use,

32:37 you know, something else supported like that, right? And it's, it's a challenge that they

32:41 obviously want to use these new tools and powerful tools, especially in data science,

32:45 right? But they've, they've got a different culture and way of describing software being ready.

32:50 You know, and we can laugh all we want to about like these compliance guys, like beating us up

32:54 for our, you know, scipy, o. whatever. But on the flip side, you know, how many of our,

33:00 our credit card reports and our gas bills come from, yeah, basically some like little ASP app or some,

33:06 you know, access database, God forbid with a bunch of VBA macros, right? That runs the world. So

33:10 how elite are we really?

33:12 That's an interesting point. Yeah. It's definitely worth thinking about. So in a broader sense though,

33:17 I feel like Python is making its way into this enterprise and a major corporation space. I know

33:25 it's increasingly being used for a lot of work, not just data science, but, you know, other types of

33:31 software as well. How do you see it? How do you see the world with your inside view you got?

33:36 Well, I think that's absolutely right. And I think that the Python community may not survive that

33:41 adoption. Interesting. What do you mean by that?

33:43 Not Python, the language, but the Python community. What I mean by that is that, you know, I've talked

33:48 to quite a few like maintainers of some popular projects and they've all reflected to me that

33:53 last couple of years as Python has gone, Python adoption just shot through the roof. I think some

33:58 of it is our pushes on data science and things like that. Others are, you know, this rapid rise of

34:04 deep learning. You know, many things have contributed to this, but ultimately Python is now one of the

34:09 most popular languages on the planet. People are getting jobs in Python and they're using Python

34:14 to do their jobs. And what we're seeing is this transition in the expectation of like, hey, man,

34:21 this is just my nine to five. Like this is a tool that I'm supposed to use to do my job.

34:25 And this tool sucks right now. So I'm going to get on your GitHub and I'm going to give you a bunch

34:28 of grief about it because this is your freaking tool. You know, my, like my employer, I got to feed

34:34 my family. My employer tells me how to use this tool. It's a piece of crap. And so that is,

34:39 that's what I said. I think the Python community might not survive that adoption transition unless

34:44 it intentionally really works hard to drive a positive, like to drive some values into the

34:52 newcomers.

34:52 So maybe that person that comes and complains because, well, I used to download my stuff from

34:58 Microsoft.com. Now I get it from Python.org, but this thing sucks. So I'm going to go back and just

35:03 complain about it as if, you know, there's a commercial entity on the other side whose job

35:08 it is to make the SLA legit.

35:11 Right. Right. But more likely, more likely, actually, they picked up, they inherited some

35:15 piece of crap, three-year-old Python code from some guy who didn't know what he was doing.

35:19 Written in Python 2.5 or something. Yeah.

35:21 Oh, absolutely. It'll be, it'll be 2.5. I think there's a couple of 2.4 things running

35:25 around that I'm aware of, but a lot of 2.5, there's a lot of 2.5 out there. And yeah,

35:30 and it's using some old version of that plot lib or something or some old version of pandas. And

35:34 they're going to complain, you know, on the tracker or on the, you know, on the issue tracker about that.

35:38 And part of the cultural change that I think we should try to encourage sounds like, okay,

35:44 you're doing this for your job. You need, it's not so great. We are the maintainers, but you have

35:50 a company who depends upon this. Can your company contribute some time, a PR, some fit, like it's got

35:57 to be a two-way street. I think it can't just be, well, you know, one of the things I suspect that

36:01 you also feel at Anaconda Inc. is there are so many companies out there making millions and billions

36:10 of dollars a year on top of free. There's like people working in their free time on some open

36:16 source project that company is basically built upon and they make billions of dollars and contribute back

36:22 nearly zero or zero.

36:24 Yes. I've frequently equipped that I can fit probably the core NumPy pandas maintainers

36:31 in my, no, no, my, okay. So we've gotten a few more now, so they don't all fit my minivan,

36:36 but at one point in time, certainly core NumPy.

36:39 You're going to need one of those longer, like full vans that holds 15 people.

36:43 I may need a 15 person van, but I could, I could probably fit them in the 15 person van.

36:47 You know, Matt Plotlib, which everybody relies on is like just a few people, maybe part-time. There's

36:54 not like one whole FTE on it even.

36:55 Yeah.

36:56 There's projects like Jupyter that are very large, but also underfunded. And there's projects

37:00 that are small and underfunded. And it's extreme. Yes. It's exceptionally tragic.

37:05 Right.

37:06 It's exceptionally tragic.

37:07 Well, and do you know that I think the part of the tragedy to me is like, if it really took

37:11 a thousand people to make Matt Plotlib, 600 people to make Flask, maybe the community can't contribute

37:18 back enough to pay those thousand engineers full-time. But like you said, it's like a van full of people,

37:26 or it's my small car full of people for Flask, right? And click and all those things. The people

37:33 and the companies that use Flask make so much money and depends so heavily upon it that they could easily

37:39 pay those three, four or five people to be full-time on that and be doing really well. Right. But they don't.

37:46 Right. It's just, it's not even asking very much of them, which is what's crazy.

37:49 I'm of two minds on this or not two minds, but I have like two major views on this.

37:53 One of them is that we should look at this as the triumph of software. I mean,

37:58 to sort of just to sort of restate the point you're making, which is that,

38:00 holy crap, one or two or 10 people can build something that is fundamental to

38:09 billions and billions of dollars of global economic activity. That's something to be celebrated,

38:15 right? Because that should free up. Think of how many more thousands of software developers

38:19 don't have to be working on Flask. They can just go and have free time. Not really,

38:23 but you know, in theory, that's how.

38:24 Build something more interesting than just the framework, right? They could build something with

38:28 this result.

38:29 So that's one way to look at it and that we should celebrate where we can. But on the other hand,

38:34 the thing is like, if we can't even somehow come up with the funding for like 10 FTEs for these

38:39 fundamental projects, what's broken? What's broken, right? Because it can't be, it's not,

38:44 it can't be that hard. And so I think there's two ways to look at this. One is that the open source

38:50 community as the, essentially the field of software, I think it's essentially commoditizing out and the

38:57 labor, what open source represents. And this particular thing happening in the Python ecosystem

39:01 is the very vanguard of this transition. It represents essentially the end of labor economics

39:07 for software. And so that going away, we're at that transition. And so it's very hard to think about it

39:15 for companies because companies will allocate budget for software development in a very like

39:21 headcount oriented way, right? And they know what they're getting when they pay for an FTE dev here

39:26 or there or wherever.

39:27 Sure.

39:27 If they just throw money at some open source, what are they getting for it? You know, they know how to,

39:31 they know how to pay money for software. Companies are very good at paying money for software,

39:34 but paying for stuff that they can already get for free. They literally, that is a null value on a

39:40 spreadsheet. They cannot compute that. It is a NAN, right? So my view on this is actually quite simple,

39:46 which is that if open source developers, the people like me who care about the open source ecosystem,

39:51 if we want to sustain the community innovation and that positive abundance mentality that we have in the

39:59 open source ecology, the human ecology of open source has moved to post scarcity, post labor economics.

40:05 If we want to sustain that, then we need to actually drive a new conversation. We need to actually

40:10 provide the tooling and the infrastructure for the companies to think about how to consume this.

40:17 This portion of Talk Python to Me is brought to you by Rollbar. Got a question for you. Have you been

40:22 outsourcing your bug discovery to your users? Have you been making them send you bug reports? You know,

40:27 there's two problems with that. You can't discover all the bugs this way. And some users don't bother

40:32 reporting bugs at all. They just leave sometimes forever. The best software teams practice proactive

40:38 error monitoring. They detect all the errors in their production apps and services in real time and

40:43 debug important errors in minutes or hours, sometimes before users even notice. Teams from companies like

40:49 Twilio, Instacart and CircleCI use Rollbar to do this. With Rollbar, you get a real time feed of all the errors

40:56 so you know exactly what's broken in production. And Rollbar automatically collects all the relevant data and

41:02 metadata you need to debug the errors so you don't have to sift through logs. If you aren't using Rollbar yet,

41:07 they have a special offer for you. And it's really awesome. Sign up and install Rollbar at

41:12 talkpython.fm/Rollbar. And Rollbar will send you a $100 gift card to use at the Open Collective,

41:18 where you can donate to any of the 900 plus projects listed under the Open Source Collective or to the

41:24 Women Who Code organization. Get notified of errors in real time and make a difference in Open Source.

41:29 Visit talkpython.fm/Rollbar today.

41:34 What are some of the key elements?

41:35 One way to do it is you can look at it almost like treat each new... Number one,

41:39 it's something we have to work on ourselves, which is to not make money be a bad word,

41:44 which is still a mindset that pervades many Open Source communities and developers.

41:48 Any affiliation with any kind of money-managing, money-changing organization is seen as essentially...

41:55 It's seen as corrupting sometimes. Yeah, yeah.

41:57 It's corrupting, exactly. So, I mean, we literally had a SciPy mailing list,

42:02 I think a couple of years ago, someone was arguing that we should only allow steering council members

42:07 to be part of universities or part of academia, which they don't have their own agendas.

42:11 And the other people were just like, are you kidding me? Academics don't have agendas anymore.

42:15 So, people like to kid themselves a lot about this kind of stuff. But anyway,

42:19 so I think that the Open Source community needs to, number one, not be allergic to money and treat it as a corrupting influence, right?

42:25 There's companies and ways, business models that are trying to help Open Source and trying to be

42:33 good participants in it. And then there are the corrupting, evil, taking advantage of type

42:37 companies. So, like, it's not black and white, but there are certainly paths forward where

42:42 companies like you guys and others are putting in lots of effort to try to make things better

42:49 legitimately.

42:50 Yeah. And I appreciate that you recognize that. Like, we really have really tried to be good

42:54 citizens in the Open Source community. But I think companies, for a lot of companies, that

42:59 it's like the mind is willing, but the spreadsheets are weak. You know, like, it's still really hard for

43:04 people and proponents and advocates, even within those companies, to, at the end of the day,

43:08 make the budgetary justifications. Because the companies internally don't know how to,

43:12 they don't know how to reason about it.

43:14 Yeah.

43:14 You know? So, I think that's where the Open Source community can try to help. Like,

43:18 number one, one thing we could do is do almost like a Kickstarter style or like, you know,

43:23 I play Warcraft a little bit. And so, it's like world boss, like, takedown. So, before we can

43:28 release any new versions of Library XYZ next year, we've got to get this much money in, right?

43:34 Yeah.

43:34 And people basically just, but they put the money in. But I think that's actually as fun as that

43:39 would be in the Kickstarter model like that, as cool as that would be and as interesting as that

43:43 would be, I think businesses have a hard time just writing checks for donations. So,

43:48 the other thing that I think the Open Source community needs to do, I think the one that's

43:51 more realistic, is to actually form entities that can have a business-to-business conversation

43:57 with the corporate players and understand how to talk to their procurement, talk to their legal and

44:05 everyone else, and basically act as a crossover facility to do the product management so the

44:10 businesses know what they're getting for their money. It's not a charity. You know,

44:13 some things that people may not be aware of is that for a business to write a $10,000 charity check,

44:18 that comes out of a different part of the business a lot of times.

44:22 Even if everyone wants to, for budgetary and for finance and compliance reasons,

44:27 they literally cannot just write a check to some dude, you know, some Open Source hacker in the

44:32 middle of Europe somewhere. So, these are the things that we need to actually put together.

44:36 I think the allergic to money issue, I think that that can be solved with the right examples of Open Source

44:43 companies and companies entering Open Source in positive ways. But I feel like there's some kind of

44:51 structure or something that has to get between the corporations and the Open Source projects,

44:58 where it's like you say, it's not a charity check. It's you pay into this and there's, you get a little

45:05 bit more of something. And I don't know what that is, but there's something like that. Then the companies

45:10 can justify it. They say, look, we depend upon this thing. We pay, you know, 0.01% of our revenue to the

45:18 people that make it work so that our system doesn't go away. And here's what we get for that 0.01%. I don't know

45:22 what that is. It's actually, we don't have to reinvent the wheel here. It happens all the time

45:26 in every other industry. It's an industry consortium. It's an industry consortium. You pay into it. And

45:31 what happens is you get votes on various technical councils and technical boards, and they do the

45:36 product management and the dev management for what the thing should be. In the Python world, we want that

45:42 to, in all cases for a lot of these projects, we want that to still be subordinate to the vision

45:48 of the open innovation volunteer kind of crew. But there's so much housekeeping. There's so much

45:56 issue tracking stuff. There's so much like documentation, management, cleanup, just keeping

46:01 the lights on and the yak shaving. There's so much that goes into a project that these kinds of

46:06 consortium models can fund. And I think Python itself, and I'll just come out on your podcast and I'll just

46:11 say it. I think Python itself badly needs this. Yeah.

46:14 Badly needs an actual consortium like this to be operated in a way that can accept dollars easily.

46:19 That's easy for people to write checks, right? Like we all know this as entrepreneurs, like make

46:23 yourself easy to do business with. The open source community, I would say, has not made itself easy

46:27 to do business with. You got to either hire a core dev. And if you do, that core dev then has to,

46:32 in their own minds, be like, am I wearing my community hat or my employee hat, which is tough on them,

46:37 right? It's very stressful for them. And the open source community, even when we get the dollars,

46:41 we don't make it clear to the people writing the checks what those dollars are buying for them.

46:45 Like if they have a couple of issues that are easy to solve, that really can make a difference for

46:49 them, we don't necessarily prioritize those issues just because they wrote us a check because we don't

46:53 want to feel like we're that, you know, like it's that quid pro quo. So I think that you really need

46:58 some kind of facility in the middle of that access consortium that is able to help businesses steer

47:02 and guide a lot of these maintenance, pretty basic kinds of maintenance things that need to happen

47:07 for projects that would make their lives easier. And that can then funnel a ton of money into a ton

47:13 of margin on that goes into the innovation work and all the forward looking kind of stuff.

47:16 And everyone's happy.

47:18 Yeah. Do you think the PSF could do it?

47:19 I think the PSF could do it. I think that the PSF would be, I don't know if it operates as a

47:24 nonprofit.

47:25 It does. Yeah.

47:26 Yeah. So if it's a nonprofit, I think it'd be very hard for it to do it. It might need to actually

47:31 create like sort of Mozilla Foundation, Mozilla Corporation. I think it would need to create

47:35 some kind of a traditional C corporate or a B Corp, perhaps like a social mission for profit that it

47:42 owns like director seats on and, you know, the chunk of the things. But companies, a lot of times are

47:47 just prohibited from writing checks to 501c3s unless it comes out of their philanthropy group.

47:52 So again, this is that making it easy to do business with kind of thing.

47:55 Yeah. Interesting.

47:56 Absolutely. I think the PSF should spin up a thing like that. And I've been sort of

48:00 quietly advocating for this behind the scenes a little bit. And maybe I'll be more vocal about

48:04 that here this year.

48:05 All right. Well, we can spread a little word on the podcast as we just have.

48:08 It's really interesting. And I think there's absolutely lots of possibilities for business

48:15 models in open source. But I feel like there's actually a 98% gap, like 2% of that is captured.

48:26 98% of it is not because we have these large, but still not huge, like banks in the Midwest that

48:33 contribute nothing. They do no PRs. They don't do anything to that effect, right? They just,

48:38 it's just not in their culture. And like you said, there's no real mechanism for them to

48:43 pay a little and get more and justify that.

48:45 Yes. Yes. And actually some of the open source business models that are emerging now,

48:49 they present challenges of their own. Again, my overriding thesis is that the world of software

48:54 is actually commoditizing pretty quickly. And so people, like if you look at the things that have

49:01 been happening in the last six months, as I would say open source software component vendors,

49:07 like Mongo and Redis and Timescale and others, as they start getting their business eaten by the cloud

49:13 vendors, they're realizing that open source, you know, sounded great. Open core sounded great.

49:18 And then they start losing any future route to revenue. And they've got to actually aggressively

49:23 go to like dual licensing and like deep viral HEPL three kind of stuff. I don't know that open source

49:29 is even the right conversation to have anymore. I think it should be around sustainable community

49:35 innovation and the freedom to experiment, freedom to innovate, freedom to, you know, there's a lot of

49:40 like free as in beer and free as in innovation. But like, the traditional ways we have about talking about the

49:47 source code itself, again, is limited in this paradigm of like code drops. And we're beyond that now.

49:53 Yeah. And you know, you look at the cloud, for example, a lot of these places that they provide you something,

49:58 and you pay on usage, right? You don't buy any software in the cloud, but you have the subscription

50:06 model all over the place, right? And that's, that's starting to really shift the way things are working

50:11 as well. And I feel like the cloud vendors actually have this interesting lock in where they're a little

50:16 bit defended against some of these challenges that are coming up.

50:20 Well, absolutely. There's only like three major cloud vendors of significance in here in the US,

50:25 at least. And all of them are absolutely going for lock in. And they're, you know, ultimately,

50:32 their business model. It's not necessarily I mean, it's a for profit business model, put it that way,

50:36 right? Yeah, the cloud is the new lock in with a lot of those API's. It's interesting. And like this

50:40 MongoDB AWS thing you talked about, like, that's a little bit of it as well, right? But it's pretty

50:45 interesting. Yeah, I think we could probably talk for hours and hours on this, because we're both

50:50 pretty passionate about it. It's awesome. But let me ask you a few more questions before we run out of

50:55 time. Sure. These are all sort of forward looking type things. And one of them is data science from

51:00 you called out the year 2012 to me that if you look at the analytics and the graphs and the usage,

51:05 like there's a huge increase in the derivative of a lot of things around Python at 2012, up till now.

51:13 So five years further out, what do you think data science looks like? Is it still deeply working

51:20 with Python? Is it solving different problems? Where is it going?

51:23 We're going to see data science much more integrated. People have a better sense of what it

51:30 can and can't do by itself rather, right? It's a new discipline that's coming into the business. It's a

51:36 new swim lane. Everyone's trying to figure out how they stand in relation to it. There's a lot of

51:40 political, you know, fighting and a lot of experimentation within a lot of businesses that I see. But at the end of

51:44 the day, I think this idea of doing data exploration, doing model development, and revving models that are

51:52 really critical to the business is the new reality for people. So that's not going away. That's a

51:57 fundamental dynamic that's going to be here. And if you need to go and explore data, you need to go and

52:01 do model development, then you're going to be doing data science full stop, right? There's no,

52:06 like, if you need to basically bring in domain expertise, stats, and coding ability to do that

52:12 well, then you're going to need data scientists intersect. You need all three of those skills,

52:16 you need all three of those. But data scientists are going to find themselves needing to have a much

52:21 better, I think the borders between data, the data science world and the others will clarify better.

52:26 So you'll have data scientists interacting with data engineers, and much better, hopefully much better

52:31 established best practices around how that's supposed to go. And then IT people start accepting that,

52:36 yes, Python is here to stay, we're going to need to deploy real Python stuff. And we need to know a

52:40 little more something about it, right? And so a lot of these little intersectional areas right now

52:45 between data science and other concerns, same thing with BI, people right now, there's literally people

52:49 out there selling point and click visualization tools saying that's data science. And it's like,

52:53 that's not really data science. But they're going to figure that out probably in the next couple

52:58 of years. Hopefully, they get the clue. Yeah, I think that's what I think is going to happen.

53:02 Now, the result of that happening is a gigantic, I think that that clue is going to really start

53:06 hitting home in two years or so. Then the immediate next problem that people have is overall workflow

53:13 management across all of these things. Because everyone's got their favorite tools. Everyone is

53:18 producing things that touch and intersect with everyone else's stuff. How do we get all of this

53:23 stuff managed in one place? And I think that's the challenge doesn't be fit, we're gonna be square in

53:28 the middle of that conversation still. And five years from now, assuming that the Chinese economy

53:50 assuming that the Chinese economy hasn't collapsed, we are going to see some really scary stuff coming

53:56 out of Chinese and the AI innovation happening there. Because they have been, they're completely

54:01 unapologetic about using their entire national population of a billion people as a sandbox for

54:07 trying AI surveillance, sort of cybernetic, the computer controls you kind of things.

54:12 Yeah, the whole social ranking, and all that stuff that's...

54:16 So here's the terrifying thing about that. I'm going to be a little bit of a contrarian on this.

54:20 What if it turns out that their sesame credit system, Rev2, no, Rev1 is scary and crappy.

54:24 Rev2, what if it turns out that they give social sesame credits for their businesses and local

54:29 politicians? Yeah.

54:30 What if they actually start upgrading social sesame credits to being this kind of thing where

54:34 it becomes almost like a, again, back to Warcraft, but like a Warcraft honor reputation system,

54:38 right? And becomes multicolored, it becomes vectorized instead of scalar. They might actually

54:44 innovate a scary, awesome approach that has deep problems because it requires a surveillance state.

54:50 And the Western world might look at that and say, huh, you know, that actually works a lot better

54:54 than, you know, Ivanka Trump, you know, running our fast food joints.

54:58 Yeah.

54:58 Sorry, the White House. So that dates this podcast, by the way. For those who are listening months in

55:03 the future, in case you forgot, just two days ago, the President of the United States served Big

55:07 Macs at the White House. That just, that happened. So this is still fresh in our minds.

55:12 To Clemson, who won the national college football championship. Yeah.

55:16 Yes. It's incredible. Anyway. So the point is that the scary thing about the Chinese AI system

55:21 is that it might work and work really, really well.

55:23 Yeah. Not that it's just pure wrong, but actually there's aspects of it that are amazing

55:28 in its sort of black mirror, electric dreams way.

55:32 Oh yeah. Tell you what, it's going to be pretty amazing. I think the same way that like a lot of the

55:36 Western world is like, oh, well, we already saw where this goes in Orwell, so we're not going to

55:41 go there. Western world has that kind of snottiness about it. I think they're underestimating how good

55:46 it could be and how tempting that goodness can look to technologists, to the capitalists, and to the

55:53 policymakers here. That's really for me as a, as someone fled the communist regime, you know, as a

55:58 child, like that's the scary thing about it.

56:00 That is really an interesting analysis. And certainly I was thinking ethics, data ethics,

56:05 and accountability for data models and AI and ML, right? Like, sorry, you couldn't get the house.

56:12 The AI said no, right? Like, no, no, no. You have to say why the AI said no. Well, we don't know,

56:17 but it's really good. And it said no, you know, like answering that problem is going to be interesting

56:22 too.

56:22 It is. And you know, the, the thing is that already now you get denied, right? And there's already a model

56:28 that tells you why you're denied. And the AI can, this kind of gets back to that same thing with the

56:33 whole black mirror thing and the AI in China, like really, really good AI. It doesn't look like that

56:37 AI, you know? So the really, really good systems, quote unquote, good, the really effective systems

56:43 at partitioning people and spot targeting them, they're going to be dressed up in ways that are

56:48 palatable. Our robot overlords will look like Cylons. They're going to look really human-like.

56:52 This is the scary future, man. I'm not trying to like scare you and scare your listeners.

56:56 I'm just telling you though, like, this is what's coming. And as humans, I'm actually a human. I'm

57:01 not a Cylon as humans, as, you know, tribe human, I think we've got to get better at being human.

57:06 And so that's maybe too philosophical hand wavy, but anyway.

57:10 Yeah. It's really an interesting thing to ponder for sure. All right. So I guess final comment or topic

57:17 just real quickly is I feel like there's been this Python 2, 3 debate, modern Python versus legacy Python,

57:25 as I like to position it. And I feel like the adoption of modern Python in data science is much faster

57:33 than it has been in the general Python space. One, do you think that's true? And then two, why do you think that is?

57:41 One, I think it's true. And two, I think it's because a lot of data science stuff is new and

57:45 legacy data science code tends to age with models. So like a piece of data science code is only as good

57:52 as the model data that it was trained on and models change because the world changes. So there's a built

57:59 in expiration date on any data science model that you've got. So you're not keeping transaction systems

58:04 from 20 years ago live.

58:06 The complexity and the algorithms and the techniques are just not even relevant, right? Like the machine

58:11 learning of five years ago doesn't compete with the machine learning of today. And it's not like

58:15 you're just going to upgrade. It's a totally different thing. You just retrain it on TensorFlow

58:20 or Keras or whatever, right?

58:21 Right. And secondly, this is another sort of important dynamic, which is that the regulatory environment

58:27 around data science hasn't caught up. So it doesn't require you, you know, I was talking to an engineer

58:32 from a software modeling engineer from an airplane company. And he was saying, yeah, the FAA requires

58:39 us to be able to reproduce our computational design models for like decades, for decades.

58:45 Yeah. Wow.

58:46 So, I mean, yeah, because planes actually, if they're well maintained, they fly for a long time,

58:50 right? And if there's a structural failure of a part...

58:53 Right. There's a lot of 737s out there. Yeah.

58:55 Oh, yeah. And so data science just doesn't have that problem yet. And, you know, one of the earliest

58:59 adopters of Python, this is a really interesting dynamic that people may not be aware of, but

59:03 in the mid 2000s, there was a significant uptake of Python in the hedge fund and the finance industry.

59:10 And so that was Python 2, Python 2, 5, 2, 6 around the time. And so that got into a lot of places.

59:18 And finance is actually a pretty regulated area. And so a lot of that code, especially if it starts

59:23 running production finance systems, people need to keep it running, not only because they're...

59:27 Even if you stop using a particular finance model to like score or to do whatever, to price a trade

59:33 and things like that, oftentimes you'll want to go back and do what's called backtesting.

59:37 So you want to run new data against those old models, and you'll want to race them against the

59:43 new models, right? You'll want to run new models on old data and new data on old models. And so that

59:48 kind of backtesting approach, you need to keep that old code running for that purpose as well,

59:52 just from a risk management perspective. So a lot of the finance industries like running

59:56 ahead and adopting Python 2 has sort of gotten them stuck on Python 2 a little bit.

01:00:01 Okay. Interesting. Yeah. So almost a victim of its own success in a way, but in some of these

01:00:09 industries. All right. I guess we're going to have to leave it there because we're out of time. But

01:00:13 like I said, a lot of interesting stuff to talk about. I have to just put it at rest. So before we move on,

01:00:19 though, I'm going to ask you the two questions, always ask it in the show. If you're going to

01:00:23 write some Python code, what editor would you use?

01:00:25 My old go-to is still Vim. But for large code bases, I tend to use PyCharm so I can, you know,

01:00:30 sort of navigate more easily.

01:00:31 Yeah, sure. Makes sense. And then there's many, many packages on PyPI or available on CondoForge.

01:00:39 What do you think one that people maybe haven't heard of, but they should, or you want to recommend?

01:00:43 Is it bad form to pimp? Is it like to pimp your own stuff?

01:00:46 No, you do it. No, no, go ahead.

01:00:48 So I'm really, really excited about a new project that we created called Intake,

01:00:53 which I would encourage people to take a look at it. It's pretty new. We just launched it last year.

01:00:58 Yeah, it looks interesting. I was going to ask you more about it, but we just

01:01:00 have too many topics already. So tell us about it real quick.

01:01:03 So Intake is a data loading abstraction library. So it's basically just load my data,

01:01:09 and it abstracts your data loading stuff into a declarative syntax so that the beginning of your

01:01:14 data science scripts doesn't have a whole bunch of like embedded and brittle SQL calls or pandas

01:01:19 column transformations or things like that. Intake is a way to make it so that your actual

01:01:23 data science or data transformation code is sort of its own code artifact and your data bits are your

01:01:29 data bits. It's kind of a nerdy thing, but we think that it actually addresses that data,

01:01:34 that model reproducibility and code reproducibility problem that data scientists face.

01:01:38 Sounds really useful. Thanks. All right. So final call to action. People are excited about

01:01:42 the Anaconda distribution or maybe getting, making some progress on this open source business model

01:01:48 thing we talked about. What would you say to people?

01:01:50 So I would say that we have AnacondaCon coming up. So if you're actually using Python

01:01:55 in a commercial environment, strongly recommend AnacondaCon. We have a, we try to make a really good

01:02:01 blend of technology and practitioner kind of stuff and workshops there combined with

01:02:07 business perspectives. So it's not like an industry conference like Gartner or Strata.

01:02:12 It's not like a pure one of those things. It's also not a pure like tech community conference,

01:02:16 like Pi data or something like that. So it's, we try to make a mix of those things.

01:02:19 We've gotten really good reviews in the past couple of years. It's our third year doing it.

01:02:23 I'm super excited about it. It's here in Austin in April, April 3rd to 5th.

01:02:26 So that's AnacondaCon.io. And secondly, people are using Anaconda to like it and they're using it in a

01:02:32 business environment. I would recommend they check out Anaconda Enterprise. We are very,

01:02:36 very proud of the product and we have a lot of problems that we solve for people inside business

01:02:40 environments and the business use of Python for deployment, package management.

01:02:44 Yeah. Real quickly, like what, what's the, what do you get from, right? You know,

01:02:47 I talked about the business model should be, you get a little bit more for your money,

01:02:50 not just pure charity, you know, here's a PayPal donate button. What do people get real quick?

01:02:57 So Anaconda Enterprise is, it gives you the ability to have your own managed package repository.

01:03:02 It gives you a way to do secured and governed collaborative notebooks and model deployment.

01:03:07 It works in the cloud. It works on prem. Many of our customers use it across an air gap and very

01:03:12 strictly governed environments. We basically make it so that data scientists and Python practitioners

01:03:18 in business can be as effective with Anaconda as they are at home nights and weekends on their

01:03:22 own laptops. All right. Yeah. That sounds cool. We just clear all the IT hurdles. Yeah,

01:03:25 that's sweet. All right. Well, thanks for all that you've talked about here, Peter. It's been a

01:03:30 super interesting conversation. Thanks for being on the show. Thank you so much for having me. I

01:03:33 really enjoyed it. You bet. Bye. Bye-bye. This has been another episode of Talk Python to Me.

01:03:38 Our guest on this episode was Peter Wang. It's been brought to you by Linode and Rollbar.

01:03:43 Linode is your go-to hosting for whatever you're building with Python. Get four months free at

01:03:49 talkpython.fm/Linode. That's L-I-N-O-D-E. Rollbar takes the pain out of errors. They give you the

01:03:57 context insight you need to quickly locate and fix errors that might have gone unnoticed until users

01:04:02 complain, of course. Track a ridiculous number of errors for free as Talk Python to Me listeners at

01:04:07 talkpython.fm/Rollbar. Want to level up your Python? If you're just getting started, try my Python

01:04:14 Jumpstart by Building 10 Apps course. Or if you're looking for something more advanced, check out our new

01:04:20 async course that digs into all the different types of async programming you can do in Python.

01:04:25 And of course, if you're interested in more than one of these, be sure to check out our everything

01:04:29 bundle. It's like a subscription that never expires. Be sure to subscribe to the show. Open your favorite

01:04:34 podcatcher and search for Python. We should be right at the top. You can also find the iTunes feed

01:04:39 at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on

01:04:45 talkpython.fm. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it.

01:04:51 Now get out there and write some Python code.

01:04:53 Bye.

01:04:53 Bye.

01:04:54 Bye.

01:04:54 Bye.

01:04:54 Bye.

01:04:54 Bye.

01:04:54 Bye bye.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon