#198: Catching up with the Anaconda distribution Transcript
00:00 It's time to catch up with the Anaconda crew and see what's new in the Anaconda distribution.
00:04 This edition of Python was created to solve some of the stickier problems around deployment,
00:08 especially in the data science space. Their usage gives them deep insight into how Python is being
00:13 used in the enterprise space as well. And that turns out to be a very interesting part of the
00:17 conversation. Join me and Peter Wang, CTO at Anaconda Inc., on this episode of Talk Python
00:22 to Me, number 198, recorded January 16th, 2019. Welcome to Talk Python to Me, a weekly podcast
00:42 on Python, the language, the libraries, the ecosystem, and the personalities. This is your
00:47 host, Michael Kennedy. Follow me on Twitter, where I'm @mkennedy. Keep up with the show
00:51 and listen to past episodes at talkpython.fm, and follow the show on Twitter via at Talk Python.
00:56 This episode is sponsored by Linode and Rollbar. Please check out what they're offering during
01:02 their segments. It really helps support the show. Peter, welcome to Talk Python.
01:06 Thank you very much. I'm very happy to be here.
01:07 I'm happy to have you here. It's been a while since we've talked about Anaconda. I had Travis
01:12 Oliphant on the show way back when, but it seems like it's time for a catch up on what you all
01:17 have been up to. Yeah, well, there's been a lot going on. It's definitely, one of the employees
01:21 that's commented that every six months, it feels like a different company. And we do,
01:25 yeah, the space is evolving very quickly. We're trying to just keep up with it.
01:28 So you would say this data science thing is not a fad. It's probably going to be around
01:31 for a while?
01:31 At this point, I think I'm going to go on a limb and say it's probably going to be around
01:34 for a little while.
01:35 Right on. All right, before we get into all that though, let's start with your story.
01:38 How did you get into programming in Python?
01:40 I actually got into programming when I was a young kid and I've been always programming.
01:44 I've actually been programming for almost as long as I've been speaking English.
01:46 I got a PC when I first came here to the United States, so I was very lucky.
01:50 But I actually majored in physics and out of college, I started going to computer programming
01:56 as a profession. And I did a bunch of C++, but I discovered this thing called Python on Slashdot.
02:03 And I think they announced the version 152. And I was like, fine, I'll go take a look at it.
02:08 And I started playing with it and I just fell in love. And so my day job was like getting beat
02:14 up by C++ templates and out of compliance compilers. And at night, I just hack on Python.
02:19 So finally, after a few years of this, I ended up moving to Austin. I got a job doing Python
02:23 as my day job, which was awesome. In like 2004, I started at Enthought. And I did a lot of work
02:29 in the scientific community and doing consulting with Python because I knew the science given my
02:33 math and science background in physics. But I also knew the software principles and software
02:37 engineering. So it was a really fantastic time. And that's basically the long and short of it.
02:40 Yeah, that sounds like a great fit. You know, things just came together, right? You have this math and science
02:44 background and you love Python. You found this job and it all just, like all of those things came together to really put you in the right place.
02:51 They really did. I feel very, very blessed in that way. Now, it was a lot of hard work too,
02:55 but I got very comfortable. And, you know, there's this great quote from Bruce Lee that you must never,
03:00 like not, you must never get comfortable, but there will be plateaus and you can't stay there.
03:04 And so I think towards the end of the 20, the aughts, the 2000s, around 2010, I was starting to see big
03:11 data happening. And I started realizing that Python was getting used for business data analysis more
03:16 than just science and engineering. And that our little cozy scipy community could actually be
03:20 something much bigger. And so I started doing some exploration, exploratory work. I really wanted to
03:26 do like D3 for Python. You had a few of the little things I wanted to scratch, some few other itches.
03:30 And so I started Continuum with Travis in order to address some of the technical gaps that we had in
03:36 the community and the technology stack. And then also to really push a narrative in the technology
03:41 market that yes, Python is good for business use. Yes, it's production ready. Yes, you should use it.
03:47 And it can handle big data just fine. And so we really started pushing that narrative in 2012,
03:53 you know, created num focus, created py data, did all these things. And I think that the results have
03:57 spoken for themselves. I definitely think they are that they have. That's great. In 2012, I do think
04:03 there was a little bit more of a debate of, well, is it safe to use Python for our business critical
04:07 stuff? But I feel like that battle has been really solidly won, especially on the data science front,
04:15 right? There was debates about R, maybe R was the space to be. That's not really where it's at anymore,
04:20 is it?
04:20 No, there was definitely a period of language war sort of stuff going on early on. It's odd, like, you know,
04:25 even then, the discussion about is data science a fad? Is it a fad term? Isn't it just business
04:31 intelligence? Or is this just that big data hype cycle all over again? You know, there's a lot of
04:36 doubters and haters on that term. But as I've talked to more users and managers and stuff,
04:42 at businesses, it's clear that they're thinking about data analysis and data analytics in a very
04:47 different way than they have for like decades. And data science is definitely, definitely here to stay
04:51 because of that.
04:52 Absolutely, absolutely. So maybe give people a sense of what you do day to day so they know where you're
04:57 coming from.
04:58 Well, my day to day consists of my former role as CTO. I run the community innovation and open source
05:05 group here at Anaconda. I actually don't run the product engineering teams. And I work with
05:10 everyone. But my general role is working with the community, helping the various community oriented
05:15 and open source devs that we have champion their projects and work better with the broader community.
05:20 I also do a lot of industry facing technical marketing and evangelism. So a lot of customers
05:25 will have me go and speak at internal data science events they do, things like that. There's actually
05:29 remarkably few people in the Python world that really speak to industry on behalf of Python itself,
05:34 relative to the usage of it. I mean, you'll find no shortage of industry analysts talking about how
05:39 great Java is, or how great these like big data projects are, you know, all these like PR type
05:44 things. There's no one doing that for Python. And so that is actually some of my day job. And beyond
05:49 that, it's just trying to keep up with all the things that are happening in data science, machine
05:53 learning, data engineering, data visualization, AI, all of it.
05:58 On top of the advocacy role, it's a pretty much full time learning thing, right? Because there's so
06:04 much change, right?
06:05 There's so much in every area. I mean, there's all the cloud stuff too. There's edge learning,
06:09 there's data privacy, you name it. Every single area that touches data science is undergoing massive
06:13 change right now.
06:14 That's super exciting, but it's also a bit of a challenge. And I think the Anaconda distribution
06:18 does help some with that. Before we get into the distribution story, though, let's just talk about
06:24 Anaconda Inc. So when I had Travis on the show a couple years ago, it was Continuum that was the
06:32 company and Anaconda was the distribution. But now those are not different anymore, right? It's just
06:37 Anaconda, the company and the distribution.
06:40 We renamed ourselves really out of pragmatism, because we would go to places and we'd introduce
06:47 ourselves as Continuum Analytics. And they're like, oh, yes, you guys, like you got some Python stuff.
06:51 We see that here. Like, who are you guys? And then we say, oh, well, we make Anaconda. And they're like, oh,
06:56 I love Anaconda. I use Anaconda all the time and blah, blah, blah. And so we sort of like, after that started
07:01 happening to us all the time, we sort of figured like, well, maybe we should just call ourselves Anaconda.
07:06 And, you know, one of the things that held that up was for a long time, as we were growing the company and
07:12 growing the distribution, we were afraid that changing the company name would actually spook the community.
07:18 And it's a really, it's been one of these interesting things. Like I have, I have lots to say
07:22 about open source. Let's just put it that way. But it's very hard to play the game of open source,
07:27 honestly, and not still get beat up with FUD about it. And so even though we've open sourced our build
07:32 tools, we've open sourced the recipes, we open source everything from the very beginning,
07:35 there are still people in the community who distrust us because we're a company trying to make a
07:40 sustainable, build sustainable funding for this open source effort. So it's a really,
07:45 that was one of the reasons we actually were reticent to do that name change until finally
07:49 just became a no brainer that we basically had to.
07:51 Yeah. If people keep mistaking you for Anaconda Inc, maybe just say, fine, that is our name.
07:56 Yeah. And we'll just deal with the haters, you know, on a one-off basis, I guess. I don't know.
08:00 Yeah, exactly. I mean, it's not unprecedented, right? 37 Signals, who made Basecamp and,
08:06 you know, sort of founded Ruby on Rails, they eventually renamed themselves just to Basecamp.
08:11 They're like, yep, the one major project, fine, we're just called that, right? I guess it's like
08:15 Microsoft reading themselves Windows, which they're probably very happy they didn't. But,
08:19 you know, in a lot of senses, that makes sense. That's cool. Okay, so there's a broad spectrum
08:25 of folks who listen to the show. Many of them will have experience with data science. Many of them
08:31 will know what the Anaconda distribution is. But maybe just, you know, for the folks who are new or
08:36 have been working somewhere else, tell them, what is this distribution? How is it different
08:40 than the standard CPython? And why did you guys make it?
08:43 I'll try to sum this up for a technical, but not data science necessarily audience, right? The basic
08:49 gist of it is that Anaconda arose out of a failure in the Python ecosystem to address the packaging needs
08:56 for the numerical and computationally like heavyweight packages that are in Python. And so for the same
09:03 reason that Linux distributions exist, very few people build Linux from scratch. For actually
09:08 exactly the same technical reasons, we built the Anaconda distribution, because it's actually really,
09:13 really hard to correctly build all of the underlying components that you need for doing productive data
09:18 science and machine learning. And so the reason it's distribution is because all of the libraries you
09:25 build and the packages, the modules with extension modules that you load up, they need to be compiled
09:30 together, they need to be compiled in a compatible way. And so you need to agree on compiler definitions,
09:35 you need to agree on code generation targets, optimization levels, things like that. And if you
09:41 only ever use pure Python packages, so packages whose code only consists of PY files, then you basically
09:49 never run into a problem. It's only when you start having extension libraries, things that depend on maybe
09:54 system libraries, God forbid you try to cross platforms between Linux and Mac and Windows across
09:59 architectures between ARM and x86, you're completely hosed. And so we, in service to the scientific Python
10:06 community, we built this distribution that was a set of packages and a way of building packages that are
10:11 compatible with each other. So that's what the Anaconda distribution is. It's a bulk distribution with about a
10:16 couple hundred pre made libraries. And we have a package updater in it called Conda that lets you
10:23 then install thousands more that are built by us and built by a large open community that also uses the
10:29 same standards. So that's what Conda and Anaconda are in a nutshell. And it's really one of these like
10:35 packaging war kind of things or packaging, the confusion of Python packaging. We actually tried to approach
10:42 Guido back in the day to help define some standards around this. And he basically gave us a very helpful
10:49 guidance, which is maybe your packaging needs are so exotic, you need to build your own system. So we took
10:53 him at his word and we did it. And consequently, when people use Conda, in a lot of cases, things just work.
10:59 There's still like corner cases and a lot of like little rough spots, especially in terms of pip interop.
11:04 But we're very proud of the work we've done so far. And it's used in production every day by big,
11:08 big companies that people rely on Python for their production workloads. So that's basically Anaconda
11:13 and Conda in a nutshell.
11:14 Okay, well, that's a really good summary. Yeah, when I think of it, the main value is that you get
11:20 pre compiled binary versions of the packages that would otherwise have to be compiled from source when you
11:27 pip install them, right?
11:29 Yes.
11:29 And the other part is the cross package compatibility, because somebody makes one package, and they have an
11:38 interest in making them as best they can or whatever, but they don't really care about integrating and testing
11:43 against all the other open source projects that you may pull into your project that they don't even care or know about,
11:49 right? So this sort of bigger picture compatibility that you look at is pretty cool as well.
11:55 It's actually become quite critical. And I think this is one of the areas that the Python community,
11:58 in the confounding haze of packaging, and half built packaging solutions, that we've not really
12:05 been good at giving guidance to the user community about is that if all you ever need to do is build
12:10 one package for yourself, and you fully control the deployment environment, and the development
12:14 environment, then maybe you can go and do that, right? But if you actually have to work on a team
12:19 with other people, like for example, on web developers, a lot of times, they control the
12:23 server, they choose the packages they bring, and they write the code, and they can just push it out
12:28 to their server. And they're good, right?
12:30 Yeah, and they're good to go. And they can you can do any number of things that you want to, you know,
12:33 what I would what I would liken it to is if you ever do, if you build your own wheel, if you build your
12:38 own native extensions, it's like getting plastic powder or plastic pellets, and making your own mold
12:44 mold of Legos or Lego like things and pouring your own little pieces. And so as long as you're the one
12:49 that controls what they have to plug into, and you're the one that controls all the molds, then you
12:53 don't need any standard definitions of studs or holes or lengths or anything like that, you're good to go.
12:58 But if you ever want to work with other people who have their own molds and their own places and
13:03 studs, they want to put these things on, you've got to come up with a standard definition. And so what
13:08 Anaconda is essentially, it's like a Lego system, we've standardized what the studs are and what the
13:13 holes are. So lots of people can build different kinds of Legos, and they all can plug together.
13:16 And that's kind of the long and the short of it.
13:18 Yeah, very interesting. So some other things that are in play there are you talked about Conda and
13:25 installing the packages that you built, right, the couple hundred or whatever that come with the
13:29 distribution. But then you also said installing the others through this thing called Conda Forge.
13:35 What's Conda Forge?
13:35 Well, Conda Forge is a community of people who I would say out of a masochistic charity to the
13:42 community. They take on the job of maintaining build scripts and recipes that take upstream
13:48 software and make it so it's actually buildable in a reproducible way and that it works with other
13:54 things. So it's a community of package builders and they have several hundred contributors and
13:59 they've built thousands of packages. We ourselves build about a thousand, although only 200 are built
14:04 into the big Anaconda installer download. But the Conda Forge community goes even beyond that and
14:09 builds several thousand. And that's what Conda Forge is.
14:11 Yeah. Interesting. So people are like, you know, it's really painful to build this package,
14:16 but only one of us should ever suffer and feel that once. And we'll do that on behalf of the
14:21 community. I'll take that on for this one package.
14:23 Yeah, basically. I mean, you know, the real challenge is it's one of those things in life
14:27 where it's almost worse that it's easy to do a bad job. I don't know that we have a term for this
14:32 in English. Maybe there's a long German word for it. But it's like the same thing with the coding
14:36 principles of like, if something is broken, you want it to break loudly and fail loudly,
14:40 right? You don't want it to make a half effort. Sometimes it kind of works sometimes. And so with,
14:45 but building package is the same thing. Most people can kind of get a build working for most things,
14:51 but does it work well? Will they ever be able to do it again? Like it doesn't work with anything else.
14:57 None of those things, you know, it takes a lot of work to make a good package build. So,
15:01 well, that speaks to the reproducibility side of things. And I know in data science and
15:06 scientists using data science tools, that reproducibility is a super important aspect.
15:11 And I guess the first step is I can run the software, which means I can build the packages
15:16 and install them.
15:17 Right. And that is really what we think that providing pre-built binaries and then having
15:22 good provenance of the build system itself. That's really some of the only ways you can really
15:27 honestly, like not kidding yourself, have reproducibility. I think some people think
15:32 that Docker somehow saves them, but it really doesn't. So it's kind of a struggle right now,
15:38 honestly, because there's so many moving pieces. There's a lot of confusion in that space, but I do.
15:42 Yes, I do agree with you that Conda packages used properly can absolutely be a great way to ensure
15:47 reproducibility for data science.
15:49 Yeah. Well, it's probably better than saying, well, if you want to install this package,
15:54 you're going to need to have the Visual Studio 2008 compiler set up correctly on your machine
15:59 in 2025 or whatever, right? When it's no longer compatible with the Windows or who knows what,
16:05 right?
16:05 Yeah. We're going to have to, like, one of the reasons I think that our team,
16:08 the Conda and Anaconda team are happy to move away from Python 2 is because the dependency on that
16:13 compiler. Someday when we finally put Python 2 to rest, I'm probably going to try to eBay a bunch of,
16:19 like, boxes of those CDs just so they can break them out of, you know, sort of like a cleansing
16:24 bonfire or something. I don't know. Maybe you shouldn't burn CDs. That's bad, actually.
16:28 Yeah, but you could have some sort of ceremony with them for sure.
16:32 Yeah.
16:33 I think the new Python 3.7, it uses MSBuild. Is that right?
16:38 You know, I'm not sure on the details of that, but I think that there have been significant
16:42 improvements. And, you know, the Python folks who work at Microsoft have worked really hard
16:49 to improve the compiler situation there for Python. I think it's much better now with Python 3 and in
16:53 the later releases of Windows. It's just we have, you know, very old Python, very old Windows that
16:59 still are deployed that we have to keep those users going. So that's where almost all the pain is.
17:04 I can imagine. Yeah, I just had Steve Dauer from Microsoft on the show, and he's in charge of the
17:08 installer and stuff there. And he's doing some really, really cool stuff to make it more accessible
17:12 on Windows. And it's easy to go to conferences and forget how important Windows actually is,
17:19 right? You look around, it looks like everyone has a Mac. There's a few people running Linux.
17:23 That's pretty much what you see at the conferences, right? But that's not what the actual consumption
17:28 out in the world is, is it?
17:30 No, that's not at all reflective of the of even the United States. And then you go to the broader
17:35 world. It's a lot of Windows. It's a lot of Windows, a lot of Linux, too. But yeah, I think this is one of
17:42 the structural problems that faces the open source community is that when you're small, it's easy to do
17:47 product management, because it's like you and your buddies. But once you get bigger, you have to actually
17:52 intentionally go and try to pull in information from your users. And I think that's the Python, that's
17:57 actually, I think, a structural challenge for the Python community at this point in time.
18:00 When we're talking about Conda Forge and things like that, something I had not heard of before,
18:05 but I saw that you're running is something called BioConda. Now, it sounds like it might have to do with
18:11 biology and data science around biology, but that's all I can discern from it. Tell us about that.
18:16 That's new to me.
18:17 So BioConda is actually not one of our projects. And oh, I should have said this earlier with Conda Forge.
18:22 BioConda, Conda Forge, and various other sort of groups, they use our Anaconda Cloud package hosting
18:29 infrastructure to support their community. Because with the Conda package installer, it's easy to give
18:35 it a namespace flag, basically a channel name, and then it will go and download packages only from that
18:40 channel on Anaconda Cloud. So these represent, Conda Forge and BioConda represent different communities
18:46 that are using the Conda packaging tool, but they may have set slightly different standards or included
18:50 certain other standards in their build system protocols and standards. So all these packages
18:55 work together. So yes, BioConda is for the biology, genomics sort of community.
19:00 Yeah.
19:01 They have very specialized, well, specialized is maybe a euphemism, but there's a lot of specialized
19:05 software needs in the biology community. It's very R-centric. There's a lot of, depending on what
19:11 you're doing in that domain, there's a lot of PERL sometimes.
19:13 So...
19:14 Yeah, interesting. We'll leave that there.
19:17 Are there other ones? Is there like a ChemConda or things like that?
19:20 No. So there's actually... Yeah. So I think Bio... I'm going to kick myself later,
19:24 probably, as I forget some. But there are major research disciplines and communities that do use
19:29 Conda quite a bit. So I think the astronomy research community has taken on Python and embraced Python
19:34 a lot. They use Conda as a way to get nightly builds and dev builds and just really get easy
19:39 deployments, right, of their complex software. One of the things that Conda does well,
19:43 I should have said this earlier, it's not just a Python packaging tool. It's a sort of a userland
19:48 software packaging tool. So we package up R, Perl, Python, C, C++, Fortran, Java, Scala,
19:54 Ruby, Node, you name it. We really are almost like a portable userland RPM kind of thing.
20:01 And so that allows for these communities that have a lot of scientific engineering code written in not
20:07 Python, sometimes not even C or C++. We can package all those things up together, move
20:12 these collections of packages around.
20:13 Yeah, that's pretty interesting. That takes the challenge of packaging and sort of
20:18 magnifies it extremely, right? Multiplies it combinatorially.
20:22 Oh, yeah. Oh, yeah. It definitely gets pretty complex.
20:28 This portion of Talk Python to me is brought to you by Linode. Are you looking for hosting that's fast,
20:33 simple, and incredibly affordable? Well, look past that bookstore and check out Linode at
20:38 talkpython.fm/Linode. That's L-I-N-O-D-E. Plans start at just $5 a month for a dedicated server
20:45 with a gig of RAM. They have 10 data centers across the globe. So no matter where you are or where your
20:50 users are, there's a data center for you. Whether you want to run a Python web app, host a private Git server,
20:55 or just a file server, you'll get native SSDs on all the machines, a newly upgraded 200 gigabit network,
21:02 24-7 friendly support, even on holidays, and a seven-day money-back guarantee. Need a little help
21:07 with your infrastructure? They even offer professional services to help you with architecture, migrations,
21:12 and more. Do you want a dedicated server for free for the next four months? Just visit talkpython.fm
21:18 slash Linode. So another thing that looks like it's doing really well is Anaconda Cloud. And so
21:26 this is a place where like data scientists can share their work and their packages and things like that.
21:31 Is that right? Yes. So right now, Anaconda Cloud is primarily, I think, used as a package hosting
21:35 environment. And a lot of developers in the data science ecosystem use it as a way to publish
21:40 nightlies or dev builds. Many of the projects, the key projects, they give us a heads up when they're
21:45 about to cut a new release so that they can push, make sure that they can announce the Conda package
21:50 at the same time they announce the release of the, you know, cutting new version of the software. So
21:54 it's very nice of them. Yeah. So how's that work alongside as well as moving differently than just
22:00 putting on PyPI? It gets pretty complex. So number one, there's channel support. So we basically have
22:07 individual developers can have their own channel and those packages, you know, their users can just
22:11 download packages from just that channel and not sort of a single global namespace, right?
22:17 Another really important thing is that there's not just one build. So Conda as a packaging system
22:22 has much deeper and richer metadata about the build environment and what it expects of the runtime
22:28 environment. So I can build a package that the same upstream software, I can build different versions
22:33 that are optimized for different levels of your hardware, like whether or not you have GPUs,
22:37 whether or not you want, you have an advanced Intel chip or a relatively basic chip, I can push all of
22:42 that stuff in. And maybe using this version of a compiler or that version of a compiler, like Clang
22:48 versus GNU GCC, you know, these things actually make material difference in whether or not the package
22:53 will work. That level of resolution and that ability to feature flag and select is not available on PyPI
22:59 as far as I'm aware. And again, it's just, you know, even if one package is available, if you use
23:04 pip to install PyPI, pip aggressively goes and tries to build other things from source, right? And if it
23:09 doesn't, it sort of has a very, it doesn't do an a priori solve what you need, it sort of grabs things
23:14 as they go. And so you can end up with very much the incorrect packages coming down, you can end up
23:19 trying to build something from source that maybe build successfully. But again, that's not what you
23:22 wanted. You want the pre build, right?
23:24 Right, with different settings, different compiler. Okay, that's the primary difference.
23:27 It is frustrating periodically that you can say, here's a bunch of things I need to install
23:32 on pip, you know, pip install these things. And one of them will have a requirement that the version
23:38 of one part is no larger than such and such. And yet it'll go grab, you know, depending on the order
23:44 once you specify it, it may grab the wrong one, you know, and just install that. And then the other
23:50 package is incompatible. Like there's weird little cases like that you can get into all the time,
23:55 right? Because it's actually, this is one of those areas of software development that for most people,
24:00 it's not a fun and sexy area to think about. But it's a deeply critical thing. When we rely on open
24:05 source software is to actually understand what does the dependency matrix look like. And there's no free
24:10 lunch, you know, if you do it in kind of this relatively naive way, like what pip does, then you
24:15 can easily end up in a corner, and things are incompatible. If you try to do it, what we do,
24:19 which is have very explicit and curated metadata about versions, and you do an a priori solve,
24:24 well, people complain the solve takes a long time, which it can. So there's really no free lunch on
24:30 that. I think one of the challenges that we actually have is that the metadata itself can be wrong. And
24:37 we found that all over the place. So packages think they will declare they're compatible with this
24:42 version or that version, and they're actually not. And so we have to actually patch what the upstream
24:46 declarations are. So again, it gets subtle and detailed. There's just a lot of like muck in this
24:51 area that we have to deal with. Yeah, it sounds a little bit like, these are the problems that you
24:56 can address and then learn about. If your job is to coordinate a whole bunch of packages that don't
25:02 interact intentionally with each other, right? They just want to make their project,
25:06 something that you can ship and install and use. And that's fine, right? But at this,
25:12 this interaction across them is where it gets tricky.
25:15 There's absolutely a tragedy of the commons. Like with the way I've, the metaphor I've used in the
25:19 past is that every developer, you know, open source maintainers, bless their hearts. They are way,
25:24 they're doing a thankless job a lot of times anyway, and they're way burned out and stressed.
25:28 But they're really solving for it. Does my vehicle work in my driveway? You know, can it get out of my
25:33 driveway and drive into my other maintainers driveway down the street? And if that works, they're good to go
25:38 a lot of times. And when everyone, one, every of the thousand developers in the ecosystem do this,
25:44 you'll end up with a bunch of cars squashing all over each other in the, in the, in the highways and
25:49 the freeways, because they're not thinking about that integration problem for their end users.
25:52 And the end users, a lot of times in data science, they're not sophisticated software developers.
25:56 They have no ability to solve this problem for themselves.
25:58 They're at the very edge of struggling to write a 10 line script, not understand the complexity of like
26:05 TensorFlow dependencies or something like that.
26:07 Exactly. Exactly.
26:09 So one thing that you all did recently, that seems to be a trend is you switch from the major minor
26:15 versioning scheme to calendar based scheme. And I think this is an interesting thing, especially
26:20 around open source, because we've had, you know, Mamuta Shemi created this site called Zerover,
26:26 sort of make fun of all the projects that have been around for 10, 15 years with, you know,
26:32 50 or a hundred releases, but are like 0.1 point 17, you know, some point, you know,
26:38 like really small versions. And it seems like one of the fixes is to say, well, let's move towards
26:44 something that has more to do with, I can look at the version and I can tell you without deeply
26:50 knowing that software, whether that's a new version, an old version, a medium aged version,
26:55 right? Like if I told you request was 2.1.4, is that new? Is that out of date? I don't know.
27:02 Right. But if you use this, this new style, it's pretty obvious. Like, what was the thinking there?
27:07 It's a community convention. It definitely makes it, it's for that user affordance that you can
27:11 sort of look at it and know. And also, you know, we set this expectation that we will release at a
27:16 regular cadence and it's for our own internal documentation and everything else. Everyone
27:19 just is able to collaborate more easily around that. But I think the zero ver thing, I mean,
27:24 I love Mamuta and I think it was a hilarious thing, you know, in a community here where we have
27:27 SciPy and iPython or, you know, Jupyter and other things, pandas, you know, zero dot, whatever,
27:33 or I guess it's not quite zero dot anymore, but like SciPy for sure. These things, there's actually
27:38 something we can laugh at all we want to, but there's a thing there that the author is trying
27:43 to say, or the maintainer is trying to say, which is, it's not quite ready yet.
27:46 You know, I'll call it 1.0 when I'm good and ready and I'm not ready yet. It might not be for 20 years.
27:53 And so, of course, that's also kind of a silly position to take with literally millions of people
27:57 and their production code depend on your software.
27:59 I think they're not saying that it's ready. I think what they're, they're thinking of to say
28:04 when it goes to 1.0 a lot of times is it's done and software is rarely done.
28:10 Well, software is done. The instance it's released, at least that version of it, right?
28:13 I think this is where we as an industry actually have to get, we have to up-level our thinking
28:18 about this. And we got to stop thinking about software as artifacts, hardballs of code that
28:25 are static. And we actually have to start thinking about this from a flow perspective, that we are
28:30 looking at flows of projects. And there's a covenant that is established in a relationship
28:36 between the user of one of these flows and the people who originate those flows.
28:41 And I think, you know, there's a really interesting thing I learned years ago about
28:45 aerodynamics. And basically that when planes move less than the speed of sound, you can reason
28:52 about aerodynamics somewhat similarly to water and water flow, right?
28:55 But once you break the sound barrier, the thing that actually causes you the greatest amount
29:00 of pressure on your airframe and things like that, you actually have to reason about the change
29:04 in cross-sectional area of the airplane as it moves through the air.
29:08 So it's almost more like streams of thick rope and you're shoving rope aside.
29:14 So you move from this particle flow way to looking at actual flows.
29:19 And so similarly with software, I think we've got to stop thinking about this as being just a code drop,
29:25 right? And maintainers as people who go and dump out a bunch of code and actually look at a relationship
29:31 with projects. And this gets to like sustainability. This gets to, you know,
29:35 versioning and what's what, what is the promise in a version number, all of that stuff. It's actually
29:40 deeply involved. I don't know that the software industry has really started to learn how to consume
29:46 like the enterprise consumers of open source. I don't know that their internal practices have
29:50 really caught up with thinking about it that way.
29:52 Yeah. And that's kind of why I was bringing up the versioning a little more deeply because
29:56 I think the folks that spend their time all day in open source, they know that Flask, even though it had
30:04 some small version number recently moved to 1.0, but it had some small version number, but it's really
30:10 used a lot and it's been around a lot. So it's fine. Right. But the corporate groups, the enterprise groups,
30:17 they see that as a flag of like, that's test software. We're not ready to like make our bank
30:24 run on test software. Is that the feeling that you got by interacting with, because you, you touch both
30:29 open source and enterprise groups more than a lot of folks, I would suspect.
30:34 Yes, absolutely. We, we are a B2B software company. That's where the bulk of our revenue comes from.
30:39 And absolutely. We suffered, we suffered mildly for that. You know, we have to basically go and
30:44 talk to procurement and compliance and it people that are swimming, you know, they're up to their
30:48 ears in software. They look at a spreadsheet. We come in with our software, our enterprise software
30:53 and say, well, you know, here's the open source things that are in the manifest. And they look at
30:57 this thing and they're like, what is this? This is a pile of garbage. It's all zero dot, whatever.
31:01 Right. And it's like, yeah, but that runs Instagram, you know, like that literally runs
31:05 Dropbox. So like, what are you complaining? You don't really want to get into that.
31:09 Once you have that argument with an IT guy, you've already lost.
31:11 Right. You're, you're a small insurance company with a hundred thousand customers.
31:15 You're not running, you know, YouTube with a million requests per second. That's using similar
31:21 software, right? It's, but it's the mentality, right?
31:23 Yeah. And you know, a lot of, a lot of going into any kind of, I would say that over the last,
31:29 you know, five or six years, I've had to do a lot of adulting. And one of the parts of adulting
31:33 up from just being a geek, like, you know, code nerd kind of guy to being able to actually have
31:38 customer conversations is actually having quite a bit of empathy for the customer.
31:41 Right. And from their perspective, yeah, they are just a regional bank with a few hundred
31:45 thousand customers. They don't have the budget of alphabet to write to throw at a SRE team
31:51 and a whole dev team and all that stuff. So their approaches to understanding risk and risk
31:55 mitigation from the thousands of vendors that want to sell them software. Maybe it's the most
31:59 practical, you know, I'm not, again, I'm not defending it, but I'm just saying one could come
32:03 to a point of empathy, right? With their approach.
32:04 That's a really good point. I do totally agree. It is exactly because they're small,
32:09 they can't hire the fresh new hottest software engineers that would rather be in Silicon Valley
32:15 or Austin or, you know, Portland or wherever, right? Like they just don't even have the ability
32:21 to determine whether or not what you're saying is true in a lot of, a lot of cases, right? It's like,
32:26 they just, you know, exactly.
32:28 We just rather use Microsoft. We know that they give us this SLA and this agreement and
32:32 we're just good, right? There's one way to make websites, use ASP.net. We're good. Just use,
32:37 you know, something else supported like that, right? And it's, it's a challenge that they
32:41 obviously want to use these new tools and powerful tools, especially in data science,
32:45 right? But they've, they've got a different culture and way of describing software being ready.
32:50 You know, and we can laugh all we want to about like these compliance guys, like beating us up
32:54 for our, you know, scipy, o. whatever. But on the flip side, you know, how many of our,
33:00 our credit card reports and our gas bills come from, yeah, basically some like little ASP app or some,
33:06 you know, access database, God forbid with a bunch of VBA macros, right? That runs the world. So
33:10 how elite are we really?
33:12 That's an interesting point. Yeah. It's definitely worth thinking about. So in a broader sense though,
33:17 I feel like Python is making its way into this enterprise and a major corporation space. I know
33:25 it's increasingly being used for a lot of work, not just data science, but, you know, other types of
33:31 software as well. How do you see it? How do you see the world with your inside view you got?
33:36 Well, I think that's absolutely right. And I think that the Python community may not survive that
33:41 adoption. Interesting. What do you mean by that?
33:43 Not Python, the language, but the Python community. What I mean by that is that, you know, I've talked
33:48 to quite a few like maintainers of some popular projects and they've all reflected to me that
33:53 last couple of years as Python has gone, Python adoption just shot through the roof. I think some
33:58 of it is our pushes on data science and things like that. Others are, you know, this rapid rise of
34:04 deep learning. You know, many things have contributed to this, but ultimately Python is now one of the
34:09 most popular languages on the planet. People are getting jobs in Python and they're using Python
34:14 to do their jobs. And what we're seeing is this transition in the expectation of like, hey, man,
34:21 this is just my nine to five. Like this is a tool that I'm supposed to use to do my job.
34:25 And this tool sucks right now. So I'm going to get on your GitHub and I'm going to give you a bunch
34:28 of grief about it because this is your freaking tool. You know, my, like my employer, I got to feed
34:34 my family. My employer tells me how to use this tool. It's a piece of crap. And so that is,
34:39 that's what I said. I think the Python community might not survive that adoption transition unless
34:44 it intentionally really works hard to drive a positive, like to drive some values into the
34:52 newcomers.
34:52 So maybe that person that comes and complains because, well, I used to download my stuff from
34:58 Microsoft.com. Now I get it from Python.org, but this thing sucks. So I'm going to go back and just
35:03 complain about it as if, you know, there's a commercial entity on the other side whose job
35:08 it is to make the SLA legit.
35:11 Right. Right. But more likely, more likely, actually, they picked up, they inherited some
35:15 piece of crap, three-year-old Python code from some guy who didn't know what he was doing.
35:19 Written in Python 2.5 or something. Yeah.
35:21 Oh, absolutely. It'll be, it'll be 2.5. I think there's a couple of 2.4 things running
35:25 around that I'm aware of, but a lot of 2.5, there's a lot of 2.5 out there. And yeah,
35:30 and it's using some old version of that plot lib or something or some old version of pandas. And
35:34 they're going to complain, you know, on the tracker or on the, you know, on the issue tracker about that.
35:38 And part of the cultural change that I think we should try to encourage sounds like, okay,
35:44 you're doing this for your job. You need, it's not so great. We are the maintainers, but you have
35:50 a company who depends upon this. Can your company contribute some time, a PR, some fit, like it's got
35:57 to be a two-way street. I think it can't just be, well, you know, one of the things I suspect that
36:01 you also feel at Anaconda Inc. is there are so many companies out there making millions and billions
36:10 of dollars a year on top of free. There's like people working in their free time on some open
36:16 source project that company is basically built upon and they make billions of dollars and contribute back
36:22 nearly zero or zero.
36:24 Yes. I've frequently equipped that I can fit probably the core NumPy pandas maintainers
36:31 in my, no, no, my, okay. So we've gotten a few more now, so they don't all fit my minivan,
36:36 but at one point in time, certainly core NumPy.
36:39 You're going to need one of those longer, like full vans that holds 15 people.
36:43 I may need a 15 person van, but I could, I could probably fit them in the 15 person van.
36:47 You know, Matt Plotlib, which everybody relies on is like just a few people, maybe part-time. There's
36:54 not like one whole FTE on it even.
36:55 Yeah.
36:56 There's projects like Jupyter that are very large, but also underfunded. And there's projects
37:00 that are small and underfunded. And it's extreme. Yes. It's exceptionally tragic.
37:05 Right.
37:06 It's exceptionally tragic.
37:07 Well, and do you know that I think the part of the tragedy to me is like, if it really took
37:11 a thousand people to make Matt Plotlib, 600 people to make Flask, maybe the community can't contribute
37:18 back enough to pay those thousand engineers full-time. But like you said, it's like a van full of people,
37:26 or it's my small car full of people for Flask, right? And click and all those things. The people
37:33 and the companies that use Flask make so much money and depends so heavily upon it that they could easily
37:39 pay those three, four or five people to be full-time on that and be doing really well. Right. But they don't.
37:46 Right. It's just, it's not even asking very much of them, which is what's crazy.
37:49 I'm of two minds on this or not two minds, but I have like two major views on this.
37:53 One of them is that we should look at this as the triumph of software. I mean,
37:58 to sort of just to sort of restate the point you're making, which is that,
38:00 holy crap, one or two or 10 people can build something that is fundamental to
38:09 billions and billions of dollars of global economic activity. That's something to be celebrated,
38:15 right? Because that should free up. Think of how many more thousands of software developers
38:19 don't have to be working on Flask. They can just go and have free time. Not really,
38:23 but you know, in theory, that's how.
38:24 Build something more interesting than just the framework, right? They could build something with
38:28 this result.
38:29 So that's one way to look at it and that we should celebrate where we can. But on the other hand,
38:34 the thing is like, if we can't even somehow come up with the funding for like 10 FTEs for these
38:39 fundamental projects, what's broken? What's broken, right? Because it can't be, it's not,
38:44 it can't be that hard. And so I think there's two ways to look at this. One is that the open source
38:50 community as the, essentially the field of software, I think it's essentially commoditizing out and the
38:57 labor, what open source represents. And this particular thing happening in the Python ecosystem
39:01 is the very vanguard of this transition. It represents essentially the end of labor economics
39:07 for software. And so that going away, we're at that transition. And so it's very hard to think about it
39:15 for companies because companies will allocate budget for software development in a very like
39:21 headcount oriented way, right? And they know what they're getting when they pay for an FTE dev here
39:26 or there or wherever.
39:27 Sure.
39:27 If they just throw money at some open source, what are they getting for it? You know, they know how to,
39:31 they know how to pay money for software. Companies are very good at paying money for software,
39:34 but paying for stuff that they can already get for free. They literally, that is a null value on a
39:40 spreadsheet. They cannot compute that. It is a NAN, right? So my view on this is actually quite simple,
39:46 which is that if open source developers, the people like me who care about the open source ecosystem,
39:51 if we want to sustain the community innovation and that positive abundance mentality that we have in the
39:59 open source ecology, the human ecology of open source has moved to post scarcity, post labor economics.
40:05 If we want to sustain that, then we need to actually drive a new conversation. We need to actually
40:10 provide the tooling and the infrastructure for the companies to think about how to consume this.
40:17 This portion of Talk Python to Me is brought to you by Rollbar. Got a question for you. Have you been
40:22 outsourcing your bug discovery to your users? Have you been making them send you bug reports? You know,
40:27 there's two problems with that. You can't discover all the bugs this way. And some users don't bother
40:32 reporting bugs at all. They just leave sometimes forever. The best software teams practice proactive
40:38 error monitoring. They detect all the errors in their production apps and services in real time and
40:43 debug important errors in minutes or hours, sometimes before users even notice. Teams from companies like
40:49 Twilio, Instacart and CircleCI use Rollbar to do this. With Rollbar, you get a real time feed of all the errors
40:56 so you know exactly what's broken in production. And Rollbar automatically collects all the relevant data and
41:02 metadata you need to debug the errors so you don't have to sift through logs. If you aren't using Rollbar yet,
41:07 they have a special offer for you. And it's really awesome. Sign up and install Rollbar at
41:12 talkpython.fm/Rollbar. And Rollbar will send you a $100 gift card to use at the Open Collective,
41:18 where you can donate to any of the 900 plus projects listed under the Open Source Collective or to the
41:24 Women Who Code organization. Get notified of errors in real time and make a difference in Open Source.
41:29 Visit talkpython.fm/Rollbar today.
41:34 What are some of the key elements?
41:35 One way to do it is you can look at it almost like treat each new... Number one,
41:39 it's something we have to work on ourselves, which is to not make money be a bad word,
41:44 which is still a mindset that pervades many Open Source communities and developers.
41:48 Any affiliation with any kind of money-managing, money-changing organization is seen as essentially...
41:55 It's seen as corrupting sometimes. Yeah, yeah.
41:57 It's corrupting, exactly. So, I mean, we literally had a SciPy mailing list,
42:02 I think a couple of years ago, someone was arguing that we should only allow steering council members
42:07 to be part of universities or part of academia, which they don't have their own agendas.
42:11 And the other people were just like, are you kidding me? Academics don't have agendas anymore.
42:15 So, people like to kid themselves a lot about this kind of stuff. But anyway,
42:19 so I think that the Open Source community needs to, number one, not be allergic to money and treat it as a corrupting influence, right?
42:25 There's companies and ways, business models that are trying to help Open Source and trying to be
42:33 good participants in it. And then there are the corrupting, evil, taking advantage of type
42:37 companies. So, like, it's not black and white, but there are certainly paths forward where
42:42 companies like you guys and others are putting in lots of effort to try to make things better
42:49 legitimately.
42:50 Yeah. And I appreciate that you recognize that. Like, we really have really tried to be good
42:54 citizens in the Open Source community. But I think companies, for a lot of companies, that
42:59 it's like the mind is willing, but the spreadsheets are weak. You know, like, it's still really hard for
43:04 people and proponents and advocates, even within those companies, to, at the end of the day,
43:08 make the budgetary justifications. Because the companies internally don't know how to,
43:12 they don't know how to reason about it.
43:14 Yeah.
43:14 You know? So, I think that's where the Open Source community can try to help. Like,
43:18 number one, one thing we could do is do almost like a Kickstarter style or like, you know,
43:23 I play Warcraft a little bit. And so, it's like world boss, like, takedown. So, before we can
43:28 release any new versions of Library XYZ next year, we've got to get this much money in, right?
43:34 Yeah.
43:34 And people basically just, but they put the money in. But I think that's actually as fun as that
43:39 would be in the Kickstarter model like that, as cool as that would be and as interesting as that
43:43 would be, I think businesses have a hard time just writing checks for donations. So,
43:48 the other thing that I think the Open Source community needs to do, I think the one that's
43:51 more realistic, is to actually form entities that can have a business-to-business conversation
43:57 with the corporate players and understand how to talk to their procurement, talk to their legal and
44:05 everyone else, and basically act as a crossover facility to do the product management so the
44:10 businesses know what they're getting for their money. It's not a charity. You know,
44:13 some things that people may not be aware of is that for a business to write a $10,000 charity check,
44:18 that comes out of a different part of the business a lot of times.
44:22 Even if everyone wants to, for budgetary and for finance and compliance reasons,
44:27 they literally cannot just write a check to some dude, you know, some Open Source hacker in the
44:32 middle of Europe somewhere. So, these are the things that we need to actually put together.
44:36 I think the allergic to money issue, I think that that can be solved with the right examples of Open Source
44:43 companies and companies entering Open Source in positive ways. But I feel like there's some kind of
44:51 structure or something that has to get between the corporations and the Open Source projects,
44:58 where it's like you say, it's not a charity check. It's you pay into this and there's, you get a little
45:05 bit more of something. And I don't know what that is, but there's something like that. Then the companies
45:10 can justify it. They say, look, we depend upon this thing. We pay, you know, 0.01% of our revenue to the
45:18 people that make it work so that our system doesn't go away. And here's what we get for that 0.01%. I don't know
45:22 what that is. It's actually, we don't have to reinvent the wheel here. It happens all the time
45:26 in every other industry. It's an industry consortium. It's an industry consortium. You pay into it. And
45:31 what happens is you get votes on various technical councils and technical boards, and they do the
45:36 product management and the dev management for what the thing should be. In the Python world, we want that
45:42 to, in all cases for a lot of these projects, we want that to still be subordinate to the vision
45:48 of the open innovation volunteer kind of crew. But there's so much housekeeping. There's so much
45:56 issue tracking stuff. There's so much like documentation, management, cleanup, just keeping
46:01 the lights on and the yak shaving. There's so much that goes into a project that these kinds of
46:06 consortium models can fund. And I think Python itself, and I'll just come out on your podcast and I'll just
46:11 say it. I think Python itself badly needs this. Yeah.
46:14 Badly needs an actual consortium like this to be operated in a way that can accept dollars easily.
46:19 That's easy for people to write checks, right? Like we all know this as entrepreneurs, like make
46:23 yourself easy to do business with. The open source community, I would say, has not made itself easy
46:27 to do business with. You got to either hire a core dev. And if you do, that core dev then has to,
46:32 in their own minds, be like, am I wearing my community hat or my employee hat, which is tough on them,
46:37 right? It's very stressful for them. And the open source community, even when we get the dollars,
46:41 we don't make it clear to the people writing the checks what those dollars are buying for them.
46:45 Like if they have a couple of issues that are easy to solve, that really can make a difference for
46:49 them, we don't necessarily prioritize those issues just because they wrote us a check because we don't
46:53 want to feel like we're that, you know, like it's that quid pro quo. So I think that you really need
46:58 some kind of facility in the middle of that access consortium that is able to help businesses steer
47:02 and guide a lot of these maintenance, pretty basic kinds of maintenance things that need to happen
47:07 for projects that would make their lives easier. And that can then funnel a ton of money into a ton
47:13 of margin on that goes into the innovation work and all the forward looking kind of stuff.
47:16 And everyone's happy.
47:18 Yeah. Do you think the PSF could do it?
47:19 I think the PSF could do it. I think that the PSF would be, I don't know if it operates as a
47:24 nonprofit.
47:25 It does. Yeah.
47:26 Yeah. So if it's a nonprofit, I think it'd be very hard for it to do it. It might need to actually
47:31 create like sort of Mozilla Foundation, Mozilla Corporation. I think it would need to create
47:35 some kind of a traditional C corporate or a B Corp, perhaps like a social mission for profit that it
47:42 owns like director seats on and, you know, the chunk of the things. But companies, a lot of times are
47:47 just prohibited from writing checks to 501c3s unless it comes out of their philanthropy group.
47:52 So again, this is that making it easy to do business with kind of thing.
47:55 Yeah. Interesting.
47:56 Absolutely. I think the PSF should spin up a thing like that. And I've been sort of
48:00 quietly advocating for this behind the scenes a little bit. And maybe I'll be more vocal about
48:04 that here this year.
48:05 All right. Well, we can spread a little word on the podcast as we just have.
48:08 It's really interesting. And I think there's absolutely lots of possibilities for business
48:15 models in open source. But I feel like there's actually a 98% gap, like 2% of that is captured.
48:26 98% of it is not because we have these large, but still not huge, like banks in the Midwest that
48:33 contribute nothing. They do no PRs. They don't do anything to that effect, right? They just,
48:38 it's just not in their culture. And like you said, there's no real mechanism for them to
48:43 pay a little and get more and justify that.
48:45 Yes. Yes. And actually some of the open source business models that are emerging now,
48:49 they present challenges of their own. Again, my overriding thesis is that the world of software
48:54 is actually commoditizing pretty quickly. And so people, like if you look at the things that have
49:01 been happening in the last six months, as I would say open source software component vendors,
49:07 like Mongo and Redis and Timescale and others, as they start getting their business eaten by the cloud
49:13 vendors, they're realizing that open source, you know, sounded great. Open core sounded great.
49:18 And then they start losing any future route to revenue. And they've got to actually aggressively
49:23 go to like dual licensing and like deep viral HEPL three kind of stuff. I don't know that open source
49:29 is even the right conversation to have anymore. I think it should be around sustainable community
49:35 innovation and the freedom to experiment, freedom to innovate, freedom to, you know, there's a lot of
49:40 like free as in beer and free as in innovation. But like, the traditional ways we have about talking about the
49:47 source code itself, again, is limited in this paradigm of like code drops. And we're beyond that now.
49:53 Yeah. And you know, you look at the cloud, for example, a lot of these places that they provide you something,
49:58 and you pay on usage, right? You don't buy any software in the cloud, but you have the subscription
50:06 model all over the place, right? And that's, that's starting to really shift the way things are working
50:11 as well. And I feel like the cloud vendors actually have this interesting lock in where they're a little
50:16 bit defended against some of these challenges that are coming up.
50:20 Well, absolutely. There's only like three major cloud vendors of significance in here in the US,
50:25 at least. And all of them are absolutely going for lock in. And they're, you know, ultimately,
50:32 their business model. It's not necessarily I mean, it's a for profit business model, put it that way,
50:36 right? Yeah, the cloud is the new lock in with a lot of those API's. It's interesting. And like this
50:40 MongoDB AWS thing you talked about, like, that's a little bit of it as well, right? But it's pretty
50:45 interesting. Yeah, I think we could probably talk for hours and hours on this, because we're both
50:50 pretty passionate about it. It's awesome. But let me ask you a few more questions before we run out of
50:55 time. Sure. These are all sort of forward looking type things. And one of them is data science from
51:00 you called out the year 2012 to me that if you look at the analytics and the graphs and the usage,
51:05 like there's a huge increase in the derivative of a lot of things around Python at 2012, up till now.
51:13 So five years further out, what do you think data science looks like? Is it still deeply working
51:20 with Python? Is it solving different problems? Where is it going?
51:23 We're going to see data science much more integrated. People have a better sense of what it
51:30 can and can't do by itself rather, right? It's a new discipline that's coming into the business. It's a
51:36 new swim lane. Everyone's trying to figure out how they stand in relation to it. There's a lot of
51:40 political, you know, fighting and a lot of experimentation within a lot of businesses that I see. But at the end of
51:44 the day, I think this idea of doing data exploration, doing model development, and revving models that are
51:52 really critical to the business is the new reality for people. So that's not going away. That's a
51:57 fundamental dynamic that's going to be here. And if you need to go and explore data, you need to go and
52:01 do model development, then you're going to be doing data science full stop, right? There's no,
52:06 like, if you need to basically bring in domain expertise, stats, and coding ability to do that
52:12 well, then you're going to need data scientists intersect. You need all three of those skills,
52:16 you need all three of those. But data scientists are going to find themselves needing to have a much
52:21 better, I think the borders between data, the data science world and the others will clarify better.
52:26 So you'll have data scientists interacting with data engineers, and much better, hopefully much better
52:31 established best practices around how that's supposed to go. And then IT people start accepting that,
52:36 yes, Python is here to stay, we're going to need to deploy real Python stuff. And we need to know a
52:40 little more something about it, right? And so a lot of these little intersectional areas right now
52:45 between data science and other concerns, same thing with BI, people right now, there's literally people
52:49 out there selling point and click visualization tools saying that's data science. And it's like,
52:53 that's not really data science. But they're going to figure that out probably in the next couple
52:58 of years. Hopefully, they get the clue. Yeah, I think that's what I think is going to happen.
53:02 Now, the result of that happening is a gigantic, I think that that clue is going to really start
53:06 hitting home in two years or so. Then the immediate next problem that people have is overall workflow
53:13 management across all of these things. Because everyone's got their favorite tools. Everyone is
53:18 producing things that touch and intersect with everyone else's stuff. How do we get all of this
53:23 stuff managed in one place? And I think that's the challenge doesn't be fit, we're gonna be square in
53:28 the middle of that conversation still. And five years from now, assuming that the Chinese economy
53:50 assuming that the Chinese economy hasn't collapsed, we are going to see some really scary stuff coming
53:56 out of Chinese and the AI innovation happening there. Because they have been, they're completely
54:01 unapologetic about using their entire national population of a billion people as a sandbox for
54:07 trying AI surveillance, sort of cybernetic, the computer controls you kind of things.
54:12 Yeah, the whole social ranking, and all that stuff that's...
54:16 So here's the terrifying thing about that. I'm going to be a little bit of a contrarian on this.
54:20 What if it turns out that their sesame credit system, Rev2, no, Rev1 is scary and crappy.
54:24 Rev2, what if it turns out that they give social sesame credits for their businesses and local
54:29 politicians? Yeah.
54:30 What if they actually start upgrading social sesame credits to being this kind of thing where
54:34 it becomes almost like a, again, back to Warcraft, but like a Warcraft honor reputation system,
54:38 right? And becomes multicolored, it becomes vectorized instead of scalar. They might actually
54:44 innovate a scary, awesome approach that has deep problems because it requires a surveillance state.
54:50 And the Western world might look at that and say, huh, you know, that actually works a lot better
54:54 than, you know, Ivanka Trump, you know, running our fast food joints.
54:58 Yeah.
54:58 Sorry, the White House. So that dates this podcast, by the way. For those who are listening months in
55:03 the future, in case you forgot, just two days ago, the President of the United States served Big
55:07 Macs at the White House. That just, that happened. So this is still fresh in our minds.
55:12 To Clemson, who won the national college football championship. Yeah.
55:16 Yes. It's incredible. Anyway. So the point is that the scary thing about the Chinese AI system
55:21 is that it might work and work really, really well.
55:23 Yeah. Not that it's just pure wrong, but actually there's aspects of it that are amazing
55:28 in its sort of black mirror, electric dreams way.
55:32 Oh yeah. Tell you what, it's going to be pretty amazing. I think the same way that like a lot of the
55:36 Western world is like, oh, well, we already saw where this goes in Orwell, so we're not going to
55:41 go there. Western world has that kind of snottiness about it. I think they're underestimating how good
55:46 it could be and how tempting that goodness can look to technologists, to the capitalists, and to the
55:53 policymakers here. That's really for me as a, as someone fled the communist regime, you know, as a
55:58 child, like that's the scary thing about it.
56:00 That is really an interesting analysis. And certainly I was thinking ethics, data ethics,
56:05 and accountability for data models and AI and ML, right? Like, sorry, you couldn't get the house.
56:12 The AI said no, right? Like, no, no, no. You have to say why the AI said no. Well, we don't know,
56:17 but it's really good. And it said no, you know, like answering that problem is going to be interesting
56:22 too.
56:22 It is. And you know, the, the thing is that already now you get denied, right? And there's already a model
56:28 that tells you why you're denied. And the AI can, this kind of gets back to that same thing with the
56:33 whole black mirror thing and the AI in China, like really, really good AI. It doesn't look like that
56:37 AI, you know? So the really, really good systems, quote unquote, good, the really effective systems
56:43 at partitioning people and spot targeting them, they're going to be dressed up in ways that are
56:48 palatable. Our robot overlords will look like Cylons. They're going to look really human-like.
56:52 This is the scary future, man. I'm not trying to like scare you and scare your listeners.
56:56 I'm just telling you though, like, this is what's coming. And as humans, I'm actually a human. I'm
57:01 not a Cylon as humans, as, you know, tribe human, I think we've got to get better at being human.
57:06 And so that's maybe too philosophical hand wavy, but anyway.
57:10 Yeah. It's really an interesting thing to ponder for sure. All right. So I guess final comment or topic
57:17 just real quickly is I feel like there's been this Python 2, 3 debate, modern Python versus legacy Python,
57:25 as I like to position it. And I feel like the adoption of modern Python in data science is much faster
57:33 than it has been in the general Python space. One, do you think that's true? And then two, why do you think that is?
57:41 One, I think it's true. And two, I think it's because a lot of data science stuff is new and
57:45 legacy data science code tends to age with models. So like a piece of data science code is only as good
57:52 as the model data that it was trained on and models change because the world changes. So there's a built
57:59 in expiration date on any data science model that you've got. So you're not keeping transaction systems
58:04 from 20 years ago live.
58:06 The complexity and the algorithms and the techniques are just not even relevant, right? Like the machine
58:11 learning of five years ago doesn't compete with the machine learning of today. And it's not like
58:15 you're just going to upgrade. It's a totally different thing. You just retrain it on TensorFlow
58:20 or Keras or whatever, right?
58:21 Right. And secondly, this is another sort of important dynamic, which is that the regulatory environment
58:27 around data science hasn't caught up. So it doesn't require you, you know, I was talking to an engineer
58:32 from a software modeling engineer from an airplane company. And he was saying, yeah, the FAA requires
58:39 us to be able to reproduce our computational design models for like decades, for decades.
58:45 Yeah. Wow.
58:46 So, I mean, yeah, because planes actually, if they're well maintained, they fly for a long time,
58:50 right? And if there's a structural failure of a part...
58:53 Right. There's a lot of 737s out there. Yeah.
58:55 Oh, yeah. And so data science just doesn't have that problem yet. And, you know, one of the earliest
58:59 adopters of Python, this is a really interesting dynamic that people may not be aware of, but
59:03 in the mid 2000s, there was a significant uptake of Python in the hedge fund and the finance industry.
59:10 And so that was Python 2, Python 2, 5, 2, 6 around the time. And so that got into a lot of places.
59:18 And finance is actually a pretty regulated area. And so a lot of that code, especially if it starts
59:23 running production finance systems, people need to keep it running, not only because they're...
59:27 Even if you stop using a particular finance model to like score or to do whatever, to price a trade
59:33 and things like that, oftentimes you'll want to go back and do what's called backtesting.
59:37 So you want to run new data against those old models, and you'll want to race them against the
59:43 new models, right? You'll want to run new models on old data and new data on old models. And so that
59:48 kind of backtesting approach, you need to keep that old code running for that purpose as well,
59:52 just from a risk management perspective. So a lot of the finance industries like running
59:56 ahead and adopting Python 2 has sort of gotten them stuck on Python 2 a little bit.
01:00:01 Okay. Interesting. Yeah. So almost a victim of its own success in a way, but in some of these
01:00:09 industries. All right. I guess we're going to have to leave it there because we're out of time. But
01:00:13 like I said, a lot of interesting stuff to talk about. I have to just put it at rest. So before we move on,
01:00:19 though, I'm going to ask you the two questions, always ask it in the show. If you're going to
01:00:23 write some Python code, what editor would you use?
01:00:25 My old go-to is still Vim. But for large code bases, I tend to use PyCharm so I can, you know,
01:00:30 sort of navigate more easily.
01:00:31 Yeah, sure. Makes sense. And then there's many, many packages on PyPI or available on CondoForge.
01:00:39 What do you think one that people maybe haven't heard of, but they should, or you want to recommend?
01:00:43 Is it bad form to pimp? Is it like to pimp your own stuff?
01:00:46 No, you do it. No, no, go ahead.
01:00:48 So I'm really, really excited about a new project that we created called Intake,
01:00:53 which I would encourage people to take a look at it. It's pretty new. We just launched it last year.
01:00:58 Yeah, it looks interesting. I was going to ask you more about it, but we just
01:01:00 have too many topics already. So tell us about it real quick.
01:01:03 So Intake is a data loading abstraction library. So it's basically just load my data,
01:01:09 and it abstracts your data loading stuff into a declarative syntax so that the beginning of your
01:01:14 data science scripts doesn't have a whole bunch of like embedded and brittle SQL calls or pandas
01:01:19 column transformations or things like that. Intake is a way to make it so that your actual
01:01:23 data science or data transformation code is sort of its own code artifact and your data bits are your
01:01:29 data bits. It's kind of a nerdy thing, but we think that it actually addresses that data,
01:01:34 that model reproducibility and code reproducibility problem that data scientists face.
01:01:38 Sounds really useful. Thanks. All right. So final call to action. People are excited about
01:01:42 the Anaconda distribution or maybe getting, making some progress on this open source business model
01:01:48 thing we talked about. What would you say to people?
01:01:50 So I would say that we have AnacondaCon coming up. So if you're actually using Python
01:01:55 in a commercial environment, strongly recommend AnacondaCon. We have a, we try to make a really good
01:02:01 blend of technology and practitioner kind of stuff and workshops there combined with
01:02:07 business perspectives. So it's not like an industry conference like Gartner or Strata.
01:02:12 It's not like a pure one of those things. It's also not a pure like tech community conference,
01:02:16 like Pi data or something like that. So it's, we try to make a mix of those things.
01:02:19 We've gotten really good reviews in the past couple of years. It's our third year doing it.
01:02:23 I'm super excited about it. It's here in Austin in April, April 3rd to 5th.
01:02:26 So that's AnacondaCon.io. And secondly, people are using Anaconda to like it and they're using it in a
01:02:32 business environment. I would recommend they check out Anaconda Enterprise. We are very,
01:02:36 very proud of the product and we have a lot of problems that we solve for people inside business
01:02:40 environments and the business use of Python for deployment, package management.
01:02:44 Yeah. Real quickly, like what, what's the, what do you get from, right? You know,
01:02:47 I talked about the business model should be, you get a little bit more for your money,
01:02:50 not just pure charity, you know, here's a PayPal donate button. What do people get real quick?
01:02:57 So Anaconda Enterprise is, it gives you the ability to have your own managed package repository.
01:03:02 It gives you a way to do secured and governed collaborative notebooks and model deployment.
01:03:07 It works in the cloud. It works on prem. Many of our customers use it across an air gap and very
01:03:12 strictly governed environments. We basically make it so that data scientists and Python practitioners
01:03:18 in business can be as effective with Anaconda as they are at home nights and weekends on their
01:03:22 own laptops. All right. Yeah. That sounds cool. We just clear all the IT hurdles. Yeah,
01:03:25 that's sweet. All right. Well, thanks for all that you've talked about here, Peter. It's been a
01:03:30 super interesting conversation. Thanks for being on the show. Thank you so much for having me. I
01:03:33 really enjoyed it. You bet. Bye. Bye-bye. This has been another episode of Talk Python to Me.
01:03:38 Our guest on this episode was Peter Wang. It's been brought to you by Linode and Rollbar.
01:03:43 Linode is your go-to hosting for whatever you're building with Python. Get four months free at
01:03:49 talkpython.fm/Linode. That's L-I-N-O-D-E. Rollbar takes the pain out of errors. They give you the
01:03:57 context insight you need to quickly locate and fix errors that might have gone unnoticed until users
01:04:02 complain, of course. Track a ridiculous number of errors for free as Talk Python to Me listeners at
01:04:07 talkpython.fm/Rollbar. Want to level up your Python? If you're just getting started, try my Python
01:04:14 Jumpstart by Building 10 Apps course. Or if you're looking for something more advanced, check out our new
01:04:20 async course that digs into all the different types of async programming you can do in Python.
01:04:25 And of course, if you're interested in more than one of these, be sure to check out our everything
01:04:29 bundle. It's like a subscription that never expires. Be sure to subscribe to the show. Open your favorite
01:04:34 podcatcher and search for Python. We should be right at the top. You can also find the iTunes feed
01:04:39 at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on
01:04:45 talkpython.fm. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it.
01:04:51 Now get out there and write some Python code.
01:04:53 Bye.
01:04:53 Bye.
01:04:54 Bye.
01:04:54 Bye.
01:04:54 Bye.
01:04:54 Bye.
01:04:54 Bye bye.