Catching up with the Anaconda distribution
Episode Deep Dive
Guest Introduction and Background
Peter Wang is the CTO and co-founder of Anaconda Inc. He has been a long-time Python developer, starting with physics and science-focused projects before moving into commercial and large-scale data science solutions. Peter and his team have been deeply involved in creating and maintaining the Anaconda Distribution and have championed Python’s adoption in business and enterprise data science settings. In this episode, he shares insights into Anaconda’s evolution, Python packaging, open-source sustainability, and the future of data science in business.
What to Know If You're New to Python
Before diving into packaging, Conda, and enterprise data science considerations, here are a few essentials:
- Python has a broad standard library but also depends on community packages for specialized features (e.g., data science).
- Using tools like
conda
(from the Anaconda ecosystem) orpip
(from the Python standard ecosystem) to install packages can simplify or complicate your workflow depending on your goals. - Virtual environments (or Conda environments) help keep project dependencies isolated and compatible.
Key Points and Takeaways
- Anaconda Distribution: A Purpose-Built Python Distribution The Anaconda Distribution arose to solve key pain points around installing and managing scientific and data-focused Python packages on all major platforms. It ships with common data science libraries pre-compiled for consistency and easy installation. Because it standardizes compilers and dependencies, it eliminates many “it works on my machine” issues.
- Links and Tools:
- Why Conda vs. Pip? Conda provides robust dependency resolution and the ability to install non-Python libraries (e.g., R, C libraries) in a cross-platform way. Pip focuses on Python packages only, so complex builds involving C/C++ dependencies can be harder to manage. Conda attempts to ensure everything is packaged consistently, giving you a more “batteries-included” approach for data science.
- Links and Tools:
- Conda-Forge and BioConda: Community-Driven Repositories Conda-Forge is a large, community-managed collection of Conda recipes. It has thousands of packages maintained by contributors who ensure compatibility across the ecosystem. BioConda extends that concept for biology and genomics packages, providing specialized builds for life sciences.
- Links and Tools:
- Enterprise Adoption of Python Peter observes Python’s deepening roots in businesses of all sizes. Many companies once hesitant about open-source tech have embraced Python for data science, machine learning, and enterprise software systems. This shift challenges organizations to adopt better governance, packaging strategies, and consistent deployment solutions (e.g., Anaconda Enterprise).
- Links and Tools:
- Maintaining Open Source Projects at Scale Many core data science tools (NumPy, pandas, Matplotlib, etc.) have relatively small teams behind them. Meanwhile, millions of users and entire companies depend on them. Peter highlights the need for sustainable funding and collaboration models so essential libraries remain well-maintained and aligned with growing enterprise adoption.
- Links and Tools:
- Packaging Challenges and Reproducibility Scientific computing often relies on compiled extensions, system libraries, and specific compiler choices. Anaconda’s approach is to unify these configurations, ensuring that once packages are built, they’ll run reliably across target systems. This reproducibility is critical when projects must re-run old models, especially in regulated industries like finance or aerospace.
- Links and Tools:
- Transition from Python 2 to Python 3 Data science projects were quicker to move to Python 3 thanks to shorter model lifespans, outdated models get replaced, so code refreshes happen more frequently. However, legacy financial or engineering models under strict regulation may lag behind. Peter notes that the community is steadily marching forward and 3.x has become the standard.
- Links and Tools:
- Data Science’s Future: Integration, Model Management, and AI Ethics Peter believes data science is moving into a “post-hype” integration phase, where best practices for collaborating with engineering and IT teams will standardize. Attention will shift from building single models to managing the entire model lifecycle, ethical usage, and bridging data engineering, data privacy, and cloud workflows.
- Links and Tools:
- Open Source Business Models: Finding Sustainable Funding A central topic was the mismatch between huge commercial dependence on free software and insufficient resources for maintainers. New forms of consortiums, enterprise-friendly services (like dedicated “pay-for-support” channels), and improved packaging of paid features are some paths that might make open source sustainable at scale.
- Links and Tools:
- NumFOCUS (supports open-source scientific projects)
- Links and Tools:
- Intake: New Data Loading Abstraction An upcoming tool from the Anaconda ecosystem is Intake, designed to simplify data loading by removing brittle data connections from core analysis scripts. It places data catalog and ingestion logic into a straightforward, declarative layer, helping reproducibility across different environments.
- Links and Tools:
Interesting Quotes and Stories
"I discovered Python on Slashdot around version 1.52, and I just fell in love." -- Peter Wang
"If you need to build a complex data science workflow and not worry about compilers or system libraries, Conda is your friend." -- Peter Wang
"If we can’t come up with funding for ten FTEs for these fundamental Python projects, we have to ask what’s broken in our approach." -- Peter Wang
Key Definitions and Terms
- Conda: A cross-platform, language-agnostic package manager that handles Python, R, C libraries, and more.
- Conda-Forge: A community-driven resource of Conda recipes and packages.
- BioConda: An extension of Conda-Forge for bioinformatics and genomics packages.
- Anaconda Distribution: A popular Python distribution focused on data science and scientific computing, bundling many key libraries.
- Packaging: The process of bundling and distributing software and its dependencies so it can be installed consistently.
Learning Resources
Here are a few courses that can help you solidify your Python foundation and explore data science topics further:
- Python for Absolute Beginners: If you need a step-by-step introduction to Python with no previous programming experience required.
- Data Science Jumpstart with 10 Projects: Dive into real-world projects and get comfortable with the data science stack in Python.
- Python Data Visualization: Learn how to create insightful and interactive plots to communicate your data effectively in Python.
Overall Takeaway
The Anaconda Distribution remains one of the cornerstones of Python data science, offering a thoughtful approach to installation, environment management, and package consistency. This conversation illustrates how Python’s success in science and business ties closely to good packaging and open source sustainability. As more enterprises adopt Python, the community must ensure maintainers are supported, novices can learn easily, and powerful AI-driven tools grow responsibly, helping data scientists and developers alike thrive in an ever-evolving field.
Links from the show
Peter on Twitter: @pwang
JetBrains Survey Results: jetbrains.com
AnacondaCon: anacondacon.io
Episode transcripts: talkpython.fm
--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode Transcript
Collapse transcript
00:00 It's time to catch up with the Anaconda crew and see what's new in the Anaconda distribution.
00:04 This edition of Python was created to solve some of the stickier problems around deployment,
00:08 especially in the data science space. Their usage gives them deep insight into how Python is being
00:13 used in the enterprise space as well. And that turns out to be a very interesting part of the
00:17 conversation. Join me and Peter Wang, CTO at Anaconda Inc., on this episode of Talk Python
00:22 to Me, number 198, recorded January 16th, 2019. Welcome to Talk Python to Me, a weekly podcast
00:42 on Python, the language, the libraries, the ecosystem, and the personalities. This is your
00:47 host, Michael Kennedy. Follow me on Twitter, where I'm @mkennedy. Keep up with the show
00:51 and listen to past episodes at talkpython.fm, and follow the show on Twitter via at Talk Python.
00:56 This episode is sponsored by Linode and Rollbar. Please check out what they're offering during
01:02 their segments. It really helps support the show. Peter, welcome to Talk Python.
01:06 Thank you very much. I'm very happy to be here.
01:07 I'm happy to have you here. It's been a while since we've talked about Anaconda. I had Travis
01:12 Oliphant on the show way back when, but it seems like it's time for a catch up on what you all
01:17 have been up to. Yeah, well, there's been a lot going on. It's definitely, one of the employees
01:21 that's commented that every six months, it feels like a different company. And we do,
01:25 yeah, the space is evolving very quickly. We're trying to just keep up with it.
01:28 So you would say this data science thing is not a fad. It's probably going to be around
01:31 for a while?
01:31 At this point, I think I'm going to go on a limb and say it's probably going to be around
01:34 for a little while.
01:35 Right on. All right, before we get into all that though, let's start with your story.
01:38 How did you get into programming in Python?
01:40 I actually got into programming when I was a young kid and I've been always programming.
01:44 I've actually been programming for almost as long as I've been speaking English.
01:46 I got a PC when I first came here to the United States, so I was very lucky.
01:50 But I actually majored in physics and out of college, I started going to computer programming
01:56 as a profession. And I did a bunch of C++, but I discovered this thing called Python on Slashdot.
02:03 And I think they announced the version 152. And I was like, fine, I'll go take a look at it.
02:08 And I started playing with it and I just fell in love. And so my day job was like getting beat
02:14 up by C++ templates and out of compliance compilers. And at night, I just hack on Python.
02:19 So finally, after a few years of this, I ended up moving to Austin. I got a job doing Python
02:23 as my day job, which was awesome. In like 2004, I started at Enthought. And I did a lot of work
02:29 in the scientific community and doing consulting with Python because I knew the science given my
02:33 math and science background in physics. But I also knew the software principles and software
02:37 engineering. So it was a really fantastic time. And that's basically the long and short of it.
02:40 Yeah, that sounds like a great fit. You know, things just came together, right? You have this math and science
02:44 background and you love Python. You found this job and it all just, like all of those things came together to really put you in the right place.
02:51 They really did. I feel very, very blessed in that way. Now, it was a lot of hard work too,
02:55 but I got very comfortable. And, you know, there's this great quote from Bruce Lee that you must never,
03:00 like not, you must never get comfortable, but there will be plateaus and you can't stay there.
03:04 And so I think towards the end of the 20, the aughts, the 2000s, around 2010, I was starting to see big
03:11 data happening. And I started realizing that Python was getting used for business data analysis more
03:16 than just science and engineering. And that our little cozy scipy community could actually be
03:20 something much bigger. And so I started doing some exploration, exploratory work. I really wanted to
03:26 do like D3 for Python. You had a few of the little things I wanted to scratch, some few other itches.
03:30 And so I started Continuum with Travis in order to address some of the technical gaps that we had in
03:36 the community and the technology stack. And then also to really push a narrative in the technology
03:41 market that yes, Python is good for business use. Yes, it's production ready. Yes, you should use it.
03:47 And it can handle big data just fine. And so we really started pushing that narrative in 2012,
03:53 you know, created num focus, created py data, did all these things. And I think that the results have
03:57 spoken for themselves. I definitely think they are that they have. That's great. In 2012, I do think
04:03 there was a little bit more of a debate of, well, is it safe to use Python for our business critical
04:07 stuff? But I feel like that battle has been really solidly won, especially on the data science front,
04:15 right? There was debates about R, maybe R was the space to be. That's not really where it's at anymore,
04:20 is it?
04:20 No, there was definitely a period of language war sort of stuff going on early on. It's odd, like, you know,
04:25 even then, the discussion about is data science a fad? Is it a fad term? Isn't it just business
04:31 intelligence? Or is this just that big data hype cycle all over again? You know, there's a lot of
04:36 doubters and haters on that term. But as I've talked to more users and managers and stuff,
04:42 at businesses, it's clear that they're thinking about data analysis and data analytics in a very
04:47 different way than they have for like decades. And data science is definitely, definitely here to stay
04:51 because of that.
04:52 Absolutely, absolutely. So maybe give people a sense of what you do day to day so they know where you're
04:57 coming from.
04:58 Well, my day to day consists of my former role as CTO. I run the community innovation and open source
05:05 group here at Anaconda. I actually don't run the product engineering teams. And I work with
05:10 everyone. But my general role is working with the community, helping the various community oriented
05:15 and open source devs that we have champion their projects and work better with the broader community.
05:20 I also do a lot of industry facing technical marketing and evangelism. So a lot of customers
05:25 will have me go and speak at internal data science events they do, things like that. There's actually
05:29 remarkably few people in the Python world that really speak to industry on behalf of Python itself,
05:34 relative to the usage of it. I mean, you'll find no shortage of industry analysts talking about how
05:39 great Java is, or how great these like big data projects are, you know, all these like PR type
05:44 things. There's no one doing that for Python. And so that is actually some of my day job. And beyond
05:49 that, it's just trying to keep up with all the things that are happening in data science, machine
05:53 learning, data engineering, data visualization, AI, all of it.
05:58 On top of the advocacy role, it's a pretty much full time learning thing, right? Because there's so
06:04 much change, right?
06:05 There's so much in every area. I mean, there's all the cloud stuff too. There's edge learning,
06:09 there's data privacy, you name it. Every single area that touches data science is undergoing massive
06:13 change right now.
06:14 That's super exciting, but it's also a bit of a challenge. And I think the Anaconda distribution
06:18 does help some with that. Before we get into the distribution story, though, let's just talk about
06:24 Anaconda Inc. So when I had Travis on the show a couple years ago, it was Continuum that was the
06:32 company and Anaconda was the distribution. But now those are not different anymore, right? It's just
06:37 Anaconda, the company and the distribution.
06:40 We renamed ourselves really out of pragmatism, because we would go to places and we'd introduce
06:47 ourselves as Continuum Analytics. And they're like, oh, yes, you guys, like you got some Python stuff.
06:51 We see that here. Like, who are you guys? And then we say, oh, well, we make Anaconda. And they're like, oh,
06:56 I love Anaconda. I use Anaconda all the time and blah, blah, blah. And so we sort of like, after that started
07:01 happening to us all the time, we sort of figured like, well, maybe we should just call ourselves Anaconda.
07:06 And, you know, one of the things that held that up was for a long time, as we were growing the company and
07:12 growing the distribution, we were afraid that changing the company name would actually spook the community.
07:18 And it's a really, it's been one of these interesting things. Like I have, I have lots to say
07:22 about open source. Let's just put it that way. But it's very hard to play the game of open source,
07:27 honestly, and not still get beat up with FUD about it. And so even though we've open sourced our build
07:32 tools, we've open sourced the recipes, we open source everything from the very beginning,
07:35 there are still people in the community who distrust us because we're a company trying to make a
07:40 sustainable, build sustainable funding for this open source effort. So it's a really,
07:45 that was one of the reasons we actually were reticent to do that name change until finally
07:49 just became a no brainer that we basically had to.
07:51 Yeah. If people keep mistaking you for Anaconda Inc, maybe just say, fine, that is our name.
07:56 Yeah. And we'll just deal with the haters, you know, on a one-off basis, I guess. I don't know.
08:00 Yeah, exactly. I mean, it's not unprecedented, right? 37 Signals, who made Basecamp and,
08:06 you know, sort of founded Ruby on Rails, they eventually renamed themselves just to Basecamp.
08:11 They're like, yep, the one major project, fine, we're just called that, right? I guess it's like
08:15 Microsoft reading themselves Windows, which they're probably very happy they didn't. But,
08:19 you know, in a lot of senses, that makes sense. That's cool. Okay, so there's a broad spectrum
08:25 of folks who listen to the show. Many of them will have experience with data science. Many of them
08:31 will know what the Anaconda distribution is. But maybe just, you know, for the folks who are new or
08:36 have been working somewhere else, tell them, what is this distribution? How is it different
08:40 than the standard CPython? And why did you guys make it?
08:43 I'll try to sum this up for a technical, but not data science necessarily audience, right? The basic
08:49 gist of it is that Anaconda arose out of a failure in the Python ecosystem to address the packaging needs
08:56 for the numerical and computationally like heavyweight packages that are in Python. And so for the same
09:03 reason that Linux distributions exist, very few people build Linux from scratch. For actually
09:08 exactly the same technical reasons, we built the Anaconda distribution, because it's actually really,
09:13 really hard to correctly build all of the underlying components that you need for doing productive data
09:18 science and machine learning. And so the reason it's distribution is because all of the libraries you
09:25 build and the packages, the modules with extension modules that you load up, they need to be compiled
09:30 together, they need to be compiled in a compatible way. And so you need to agree on compiler definitions,
09:35 you need to agree on code generation targets, optimization levels, things like that. And if you
09:41 only ever use pure Python packages, so packages whose code only consists of PY files, then you basically
09:49 never run into a problem. It's only when you start having extension libraries, things that depend on maybe
09:54 system libraries, God forbid you try to cross platforms between Linux and Mac and Windows across
09:59 architectures between ARM and x86, you're completely hosed. And so we, in service to the scientific Python
10:06 community, we built this distribution that was a set of packages and a way of building packages that are
10:11 compatible with each other. So that's what the Anaconda distribution is. It's a bulk distribution with about a
10:16 couple hundred pre made libraries. And we have a package updater in it called Conda that lets you
10:23 then install thousands more that are built by us and built by a large open community that also uses the
10:29 same standards. So that's what Conda and Anaconda are in a nutshell. And it's really one of these like
10:35 packaging war kind of things or packaging, the confusion of Python packaging. We actually tried to approach
10:42 Guido back in the day to help define some standards around this. And he basically gave us a very helpful
10:49 guidance, which is maybe your packaging needs are so exotic, you need to build your own system. So we took
10:53 him at his word and we did it. And consequently, when people use Conda, in a lot of cases, things just work.
10:59 There's still like corner cases and a lot of like little rough spots, especially in terms of pip interop.
11:04 But we're very proud of the work we've done so far. And it's used in production every day by big,
11:08 big companies that people rely on Python for their production workloads. So that's basically Anaconda
11:13 and Conda in a nutshell.
11:14 Okay, well, that's a really good summary. Yeah, when I think of it, the main value is that you get
11:20 pre compiled binary versions of the packages that would otherwise have to be compiled from source when you
11:27 pip install them, right?
11:29 Yes.
11:29 And the other part is the cross package compatibility, because somebody makes one package, and they have an
11:38 interest in making them as best they can or whatever, but they don't really care about integrating and testing
11:43 against all the other open source projects that you may pull into your project that they don't even care or know about,
11:49 right? So this sort of bigger picture compatibility that you look at is pretty cool as well.
11:55 It's actually become quite critical. And I think this is one of the areas that the Python community,
11:58 in the confounding haze of packaging, and half built packaging solutions, that we've not really
12:05 been good at giving guidance to the user community about is that if all you ever need to do is build
12:10 one package for yourself, and you fully control the deployment environment, and the development
12:14 environment, then maybe you can go and do that, right? But if you actually have to work on a team
12:19 with other people, like for example, on web developers, a lot of times, they control the
12:23 server, they choose the packages they bring, and they write the code, and they can just push it out
12:28 to their server. And they're good, right?
12:30 Yeah, and they're good to go. And they can you can do any number of things that you want to, you know,
12:33 what I would what I would liken it to is if you ever do, if you build your own wheel, if you build your
12:38 own native extensions, it's like getting plastic powder or plastic pellets, and making your own mold
12:44 mold of Legos or Lego like things and pouring your own little pieces. And so as long as you're the one
12:49 that controls what they have to plug into, and you're the one that controls all the molds, then you
12:53 don't need any standard definitions of studs or holes or lengths or anything like that, you're good to go.
12:58 But if you ever want to work with other people who have their own molds and their own places and
13:03 studs, they want to put these things on, you've got to come up with a standard definition. And so what
13:08 Anaconda is essentially, it's like a Lego system, we've standardized what the studs are and what the
13:13 holes are. So lots of people can build different kinds of Legos, and they all can plug together.
13:16 And that's kind of the long and the short of it.
13:18 Yeah, very interesting. So some other things that are in play there are you talked about Conda and
13:25 installing the packages that you built, right, the couple hundred or whatever that come with the
13:29 distribution. But then you also said installing the others through this thing called Conda Forge.
13:35 What's Conda Forge?
13:35 Well, Conda Forge is a community of people who I would say out of a masochistic charity to the
13:42 community. They take on the job of maintaining build scripts and recipes that take upstream
13:48 software and make it so it's actually buildable in a reproducible way and that it works with other
13:54 things. So it's a community of package builders and they have several hundred contributors and
13:59 they've built thousands of packages. We ourselves build about a thousand, although only 200 are built
14:04 into the big Anaconda installer download. But the Conda Forge community goes even beyond that and
14:09 builds several thousand. And that's what Conda Forge is.
14:11 Yeah. Interesting. So people are like, you know, it's really painful to build this package,
14:16 but only one of us should ever suffer and feel that once. And we'll do that on behalf of the
14:21 community. I'll take that on for this one package.
14:23 Yeah, basically. I mean, you know, the real challenge is it's one of those things in life
14:27 where it's almost worse that it's easy to do a bad job. I don't know that we have a term for this
14:32 in English. Maybe there's a long German word for it. But it's like the same thing with the coding
14:36 principles of like, if something is broken, you want it to break loudly and fail loudly,
14:40 right? You don't want it to make a half effort. Sometimes it kind of works sometimes. And so with,
14:45 but building package is the same thing. Most people can kind of get a build working for most things,
14:51 but does it work well? Will they ever be able to do it again? Like it doesn't work with anything else.
14:57 None of those things, you know, it takes a lot of work to make a good package build. So,
15:01 well, that speaks to the reproducibility side of things. And I know in data science and
15:06 scientists using data science tools, that reproducibility is a super important aspect.
15:11 And I guess the first step is I can run the software, which means I can build the packages
15:16 and install them.
15:17 Right. And that is really what we think that providing pre-built binaries and then having
15:22 good provenance of the build system itself. That's really some of the only ways you can really
15:27 honestly, like not kidding yourself, have reproducibility. I think some people think
15:32 that Docker somehow saves them, but it really doesn't. So it's kind of a struggle right now,
15:38 honestly, because there's so many moving pieces. There's a lot of confusion in that space, but I do.
15:42 Yes, I do agree with you that Conda packages used properly can absolutely be a great way to ensure
15:47 reproducibility for data science.
15:49 Yeah. Well, it's probably better than saying, well, if you want to install this package,
15:54 you're going to need to have the Visual Studio 2008 compiler set up correctly on your machine
15:59 in 2025 or whatever, right? When it's no longer compatible with the Windows or who knows what,
16:05 right?
16:05 Yeah. We're going to have to, like, one of the reasons I think that our team,
16:08 the Conda and Anaconda team are happy to move away from Python 2 is because the dependency on that
16:13 compiler. Someday when we finally put Python 2 to rest, I'm probably going to try to eBay a bunch of,
16:19 like, boxes of those CDs just so they can break them out of, you know, sort of like a cleansing
16:24 bonfire or something. I don't know. Maybe you shouldn't burn CDs. That's bad, actually.
16:28 Yeah, but you could have some sort of ceremony with them for sure.
16:32 Yeah.
16:33 I think the new Python 3.7, it uses MSBuild. Is that right?
16:38 You know, I'm not sure on the details of that, but I think that there have been significant
16:42 improvements. And, you know, the Python folks who work at Microsoft have worked really hard
16:49 to improve the compiler situation there for Python. I think it's much better now with Python 3 and in
16:53 the later releases of Windows. It's just we have, you know, very old Python, very old Windows that
16:59 still are deployed that we have to keep those users going. So that's where almost all the pain is.
17:04 I can imagine. Yeah, I just had Steve Dower from Microsoft on the show, and he's in charge of the
17:08 installer and stuff there. And he's doing some really, really cool stuff to make it more accessible
17:12 on Windows. And it's easy to go to conferences and forget how important Windows actually is,
17:19 right? You look around, it looks like everyone has a Mac. There's a few people running Linux.
17:23 That's pretty much what you see at the conferences, right? But that's not what the actual consumption
17:28 out in the world is, is it?
17:30 No, that's not at all reflective of the of even the United States. And then you go to the broader
17:35 world. It's a lot of Windows. It's a lot of Windows, a lot of Linux, too. But yeah, I think this is one of
17:42 the structural problems that faces the open source community is that when you're small, it's easy to do
17:47 product management, because it's like you and your buddies. But once you get bigger, you have to actually
17:52 intentionally go and try to pull in information from your users. And I think that's the Python, that's
17:57 actually, I think, a structural challenge for the Python community at this point in time.
18:00 When we're talking about Conda Forge and things like that, something I had not heard of before,
18:05 but I saw that you're running is something called BioConda. Now, it sounds like it might have to do with
18:11 biology and data science around biology, but that's all I can discern from it. Tell us about that.
18:16 That's new to me.
18:17 So BioConda is actually not one of our projects. And oh, I should have said this earlier with Conda Forge.
18:22 BioConda, Conda Forge, and various other sort of groups, they use our Anaconda Cloud package hosting
18:29 infrastructure to support their community. Because with the Conda package installer, it's easy to give
18:35 it a namespace flag, basically a channel name, and then it will go and download packages only from that
18:40 channel on Anaconda Cloud. So these represent, Conda Forge and BioConda represent different communities
18:46 that are using the Conda packaging tool, but they may have set slightly different standards or included
18:50 certain other standards in their build system protocols and standards. So all these packages
18:55 work together. So yes, BioConda is for the biology, genomics sort of community.
19:00 Yeah.
19:01 They have very specialized, well, specialized is maybe a euphemism, but there's a lot of specialized
19:05 software needs in the biology community. It's very R-centric. There's a lot of, depending on what
19:11 you're doing in that domain, there's a lot of PERL sometimes.
19:13 So...
19:14 Yeah, interesting. We'll leave that there.
19:17 Are there other ones? Is there like a ChemConda or things like that?
19:20 No. So there's actually... Yeah. So I think Bio... I'm going to kick myself later,
19:24 probably, as I forget some. But there are major research disciplines and communities that do use
19:29 Conda quite a bit. So I think the astronomy research community has taken on Python and embraced Python
19:34 a lot. They use Conda as a way to get nightly builds and dev builds and just really get easy
19:39 deployments, right, of their complex software. One of the things that Conda does well,
19:43 I should have said this earlier, it's not just a Python packaging tool. It's a sort of a userland
19:48 software packaging tool. So we package up R, Perl, Python, C, C++, Fortran, Java, Scala,
19:54 Ruby, Node, you name it. We really are almost like a portable userland RPM kind of thing.
20:01 And so that allows for these communities that have a lot of scientific engineering code written in not
20:07 Python, sometimes not even C or C++. We can package all those things up together, move
20:12 these collections of packages around.
20:13 Yeah, that's pretty interesting. That takes the challenge of packaging and sort of
20:18 magnifies it extremely, right? Multiplies it combinatorially.
20:22 Oh, yeah. Oh, yeah. It definitely gets pretty complex.
20:28 This portion of Talk Python To Me is brought to you by Linode. Are you looking for hosting that's fast,
20:33 simple, and incredibly affordable? Well, look past that bookstore and check out Linode at
20:38 talkpython.fm/Linode. That's L-I-N-O-D-E. Plans start at just $5 a month for a dedicated server
20:45 with a gig of RAM. They have 10 data centers across the globe. So no matter where you are or where your
20:50 users are, there's a data center for you. Whether you want to run a Python web app, host a private Git server,
20:55 or just a file server, you'll get native SSDs on all the machines, a newly upgraded 200 gigabit network,
21:02 24-7 friendly support, even on holidays, and a seven-day money-back guarantee. Need a little help
21:07 with your infrastructure? They even offer professional services to help you with architecture, migrations,
21:12 and more. Do you want a dedicated server for free for the next four months? Just visit talkpython.fm
21:18 slash Linode. So another thing that looks like it's doing really well is Anaconda Cloud. And so
21:26 this is a place where like data scientists can share their work and their packages and things like that.
21:31 Is that right? Yes. So right now, Anaconda Cloud is primarily, I think, used as a package hosting
21:35 environment. And a lot of developers in the data science ecosystem use it as a way to publish
21:40 nightlies or dev builds. Many of the projects, the key projects, they give us a heads up when they're
21:45 about to cut a new release so that they can push, make sure that they can announce the Conda package
21:50 at the same time they announce the release of the, you know, cutting new version of the software. So
21:54 it's very nice of them. Yeah. So how's that work alongside as well as moving differently than just
22:00 putting on PyPI? It gets pretty complex. So number one, there's channel support. So we basically have
22:07 individual developers can have their own channel and those packages, you know, their users can just
22:11 download packages from just that channel and not sort of a single global namespace, right?
22:17 Another really important thing is that there's not just one build. So Conda as a packaging system
22:22 has much deeper and richer metadata about the build environment and what it expects of the runtime
22:28 environment. So I can build a package that the same upstream software, I can build different versions
22:33 that are optimized for different levels of your hardware, like whether or not you have GPUs,
22:37 whether or not you want, you have an advanced Intel chip or a relatively basic chip, I can push all of
22:42 that stuff in. And maybe using this version of a compiler or that version of a compiler, like Clang
22:48 versus GNU GCC, you know, these things actually make material difference in whether or not the package
22:53 will work. That level of resolution and that ability to feature flag and select is not available on PyPI
22:59 as far as I'm aware. And again, it's just, you know, even if one package is available, if you use
23:04 pip to install PyPI, pip aggressively goes and tries to build other things from source, right? And if it
23:09 doesn't, it sort of has a very, it doesn't do an a priori solve what you need, it sort of grabs things
23:14 as they go. And so you can end up with very much the incorrect packages coming down, you can end up
23:19 trying to build something from source that maybe build successfully. But again, that's not what you
23:22 wanted. You want the pre build, right?
23:24 Right, with different settings, different compiler. Okay, that's the primary difference.
23:27 It is frustrating periodically that you can say, here's a bunch of things I need to install
23:32 on pip, you know, pip install these things. And one of them will have a requirement that the version
23:38 of one part is no larger than such and such. And yet it'll go grab, you know, depending on the order
23:44 once you specify it, it may grab the wrong one, you know, and just install that. And then the other
23:50 package is incompatible. Like there's weird little cases like that you can get into all the time,
23:55 right? Because it's actually, this is one of those areas of software development that for most people,
24:00 it's not a fun and sexy area to think about. But it's a deeply critical thing. When we rely on open
24:05 source software is to actually understand what does the dependency matrix look like. And there's no free
24:10 lunch, you know, if you do it in kind of this relatively naive way, like what pip does, then you
24:15 can easily end up in a corner, and things are incompatible. If you try to do it, what we do,
24:19 which is have very explicit and curated metadata about versions, and you do an a priori solve,
24:24 well, people complain the solve takes a long time, which it can. So there's really no free lunch on
24:30 that. I think one of the challenges that we actually have is that the metadata itself can be wrong. And
24:37 we found that all over the place. So packages think they will declare they're compatible with this
24:42 version or that version, and they're actually not. And so we have to actually patch what the upstream
24:46 declarations are. So again, it gets subtle and detailed. There's just a lot of like muck in this
24:51 area that we have to deal with. Yeah, it sounds a little bit like, these are the problems that you
24:56 can address and then learn about. If your job is to coordinate a whole bunch of packages that don't
25:02 interact intentionally with each other, right? They just want to make their project,
25:06 something that you can ship and install and use. And that's fine, right? But at this,
25:12 this interaction across them is where it gets tricky.
25:15 There's absolutely a tragedy of the commons. Like with the way I've, the metaphor I've used in the
25:19 past is that every developer, you know, open source maintainers, bless their hearts. They are way,
25:24 they're doing a thankless job a lot of times anyway, and they're way burned out and stressed.
25:28 But they're really solving for it. Does my vehicle work in my driveway? You know, can it get out of my
25:33 driveway and drive into my other maintainers driveway down the street? And if that works, they're good to go
25:38 a lot of times. And when everyone, one, every of the thousand developers in the ecosystem do this,
25:44 you'll end up with a bunch of cars squashing all over each other in the, in the, in the highways and
25:49 the freeways, because they're not thinking about that integration problem for their end users.
25:52 And the end users, a lot of times in data science, they're not sophisticated software developers.
25:56 They have no ability to solve this problem for themselves.
25:58 They're at the very edge of struggling to write a 10 line script, not understand the complexity of like
26:05 TensorFlow dependencies or something like that.
26:07 Exactly. Exactly.
26:09 So one thing that you all did recently, that seems to be a trend is you switch from the major minor
26:15 versioning scheme to calendar based scheme. And I think this is an interesting thing, especially
26:20 around open source, because we've had, you know, Mamuta Shemi created this site called Zerover,
26:26 sort of make fun of all the projects that have been around for 10, 15 years with, you know,
26:32 50 or a hundred releases, but are like 0.1 point 17, you know, some point, you know,
26:38 like really small versions. And it seems like one of the fixes is to say, well, let's move towards
26:44 something that has more to do with, I can look at the version and I can tell you without deeply
26:50 knowing that software, whether that's a new version, an old version, a medium aged version,
26:55 right? Like if I told you request was 2.1.4, is that new? Is that out of date? I don't know.
27:02 Right. But if you use this, this new style, it's pretty obvious. Like, what was the thinking there?
27:07 It's a community convention. It definitely makes it, it's for that user affordance that you can
27:11 sort of look at it and know. And also, you know, we set this expectation that we will release at a
27:16 regular cadence and it's for our own internal documentation and everything else. Everyone
27:19 just is able to collaborate more easily around that. But I think the zero ver thing, I mean,
27:24 I love Mamuta and I think it was a hilarious thing, you know, in a community here where we have
27:27 SciPy and iPython or, you know, Jupyter and other things, pandas, you know, zero dot, whatever,
27:33 or I guess it's not quite zero dot anymore, but like SciPy for sure. These things, there's actually
27:38 something we can laugh at all we want to, but there's a thing there that the author is trying
27:43 to say, or the maintainer is trying to say, which is, it's not quite ready yet.
27:46 You know, I'll call it 1.0 when I'm good and ready and I'm not ready yet. It might not be for 20 years.
27:53 And so, of course, that's also kind of a silly position to take with literally millions of people
27:57 and their production code depend on your software.
27:59 I think they're not saying that it's ready. I think what they're, they're thinking of to say
28:04 when it goes to 1.0 a lot of times is it's done and software is rarely done.
28:10 Well, software is done. The instance it's released, at least that version of it, right?
28:13 I think this is where we as an industry actually have to get, we have to up-level our thinking
28:18 about this. And we got to stop thinking about software as artifacts, hardballs of code that
28:25 are static. And we actually have to start thinking about this from a flow perspective, that we are
28:30 looking at flows of projects. And there's a covenant that is established in a relationship
28:36 between the user of one of these flows and the people who originate those flows.
28:41 And I think, you know, there's a really interesting thing I learned years ago about
28:45 aerodynamics. And basically that when planes move less than the speed of sound, you can reason
28:52 about aerodynamics somewhat similarly to water and water flow, right?
28:55 But once you break the sound barrier, the thing that actually causes you the greatest amount
29:00 of pressure on your airframe and things like that, you actually have to reason about the change
29:04 in cross-sectional area of the airplane as it moves through the air.
29:08 So it's almost more like streams of thick rope and you're shoving rope aside.
29:14 So you move from this particle flow way to looking at actual flows.
29:19 And so similarly with software, I think we've got to stop thinking about this as being just a code drop,
29:25 right? And maintainers as people who go and dump out a bunch of code and actually look at a relationship
29:31 with projects. And this gets to like sustainability. This gets to, you know,
29:35 versioning and what's what, what is the promise in a version number, all of that stuff. It's actually
29:40 deeply involved. I don't know that the software industry has really started to learn how to consume
29:46 like the enterprise consumers of open source. I don't know that their internal practices have
29:50 really caught up with thinking about it that way.
29:52 Yeah. And that's kind of why I was bringing up the versioning a little more deeply because
29:56 I think the folks that spend their time all day in open source, they know that Flask, even though it had
30:04 some small version number recently moved to 1.0, but it had some small version number, but it's really
30:10 used a lot and it's been around a lot. So it's fine. Right. But the corporate groups, the enterprise groups,
30:17 they see that as a flag of like, that's test software. We're not ready to like make our bank
30:24 run on test software. Is that the feeling that you got by interacting with, because you, you touch both
30:29 open source and enterprise groups more than a lot of folks, I would suspect.
30:34 Yes, absolutely. We, we are a B2B software company. That's where the bulk of our revenue comes from.
30:39 And absolutely. We suffered, we suffered mildly for that. You know, we have to basically go and
30:44 talk to procurement and compliance and it people that are swimming, you know, they're up to their
30:48 ears in software. They look at a spreadsheet. We come in with our software, our enterprise software
30:53 and say, well, you know, here's the open source things that are in the manifest. And they look at
30:57 this thing and they're like, what is this? This is a pile of garbage. It's all zero dot, whatever.
31:01 Right. And it's like, yeah, but that runs Instagram, you know, like that literally runs
31:05 Dropbox. So like, what are you complaining? You don't really want to get into that.
31:09 Once you have that argument with an IT guy, you've already lost.
31:11 Right. You're, you're a small insurance company with a hundred thousand customers.
31:15 You're not running, you know, YouTube with a million requests per second. That's using similar
31:21 software, right? It's, but it's the mentality, right?
31:23 Yeah. And you know, a lot of, a lot of going into any kind of, I would say that over the last,
31:29 you know, five or six years, I've had to do a lot of adulting. And one of the parts of adulting
31:33 up from just being a geek, like, you know, code nerd kind of guy to being able to actually have
31:38 customer conversations is actually having quite a bit of empathy for the customer.
31:41 Right. And from their perspective, yeah, they are just a regional bank with a few hundred
31:45 thousand customers. They don't have the budget of alphabet to write to throw at a SRE team
31:51 and a whole dev team and all that stuff. So their approaches to understanding risk and risk
31:55 mitigation from the thousands of vendors that want to sell them software. Maybe it's the most
31:59 practical, you know, I'm not, again, I'm not defending it, but I'm just saying one could come
32:03 to a point of empathy, right? With their approach.
32:04 That's a really good point. I do totally agree. It is exactly because they're small,
32:09 they can't hire the fresh new hottest software engineers that would rather be in Silicon Valley
32:15 or Austin or, you know, Portland or wherever, right? Like they just don't even have the ability
32:21 to determine whether or not what you're saying is true in a lot of, a lot of cases, right? It's like,
32:26 they just, you know, exactly.
32:28 We just rather use Microsoft. We know that they give us this SLA and this agreement and
32:32 we're just good, right? There's one way to make websites, use ASP.net. We're good. Just use,
32:37 you know, something else supported like that, right? And it's, it's a challenge that they
32:41 obviously want to use these new tools and powerful tools, especially in data science,
32:45 right? But they've, they've got a different culture and way of describing software being ready.
32:50 You know, and we can laugh all we want to about like these compliance guys, like beating us up
32:54 for our, you know, scipy, o. whatever. But on the flip side, you know, how many of our,
33:00 our credit card reports and our gas bills come from, yeah, basically some like little ASP app or some,
33:06 you know, access database, God forbid with a bunch of VBA macros, right? That runs the world. So
33:10 how elite are we really?
33:12 That's an interesting point. Yeah. It's definitely worth thinking about. So in a broader sense though,
33:17 I feel like Python is making its way into this enterprise and a major corporation space. I know
33:25 it's increasingly being used for a lot of work, not just data science, but, you know, other types of
33:31 software as well. How do you see it? How do you see the world with your inside view you got?
33:36 Well, I think that's absolutely right. And I think that the Python community may not survive that
33:41 adoption. Interesting. What do you mean by that?
33:43 Not Python, the language, but the Python community. What I mean by that is that, you know, I've talked
33:48 to quite a few like maintainers of some popular projects and they've all reflected to me that
33:53 last couple of years as Python has gone, Python adoption just shot through the roof. I think some
33:58 of it is our pushes on data science and things like that. Others are, you know, this rapid rise of
34:04 deep learning. You know, many things have contributed to this, but ultimately Python is now one of the
34:09 most popular languages on the planet. People are getting jobs in Python and they're using Python
34:14 to do their jobs. And what we're seeing is this transition in the expectation of like, hey, man,
34:21 this is just my nine to five. Like this is a tool that I'm supposed to use to do my job.
34:25 And this tool sucks right now. So I'm going to get on your GitHub and I'm going to give you a bunch
34:28 of grief about it because this is your freaking tool. You know, my, like my employer, I got to feed
34:34 my family. My employer tells me how to use this tool. It's a piece of crap. And so that is,
34:39 that's what I said. I think the Python community might not survive that adoption transition unless
34:44 it intentionally really works hard to drive a positive, like to drive some values into the
34:52 newcomers.
34:52 So maybe that person that comes and complains because, well, I used to download my stuff from
34:58 Microsoft.com. Now I get it from Python.org, but this thing sucks. So I'm going to go back and just
35:03 complain about it as if, you know, there's a commercial entity on the other side whose job
35:08 it is to make the SLA legit.
35:11 Right. Right. But more likely, more likely, actually, they picked up, they inherited some
35:15 piece of crap, three-year-old Python code from some guy who didn't know what he was doing.
35:19 Written in Python 2.5 or something. Yeah.
35:21 Oh, absolutely. It'll be, it'll be 2.5. I think there's a couple of 2.4 things running
35:25 around that I'm aware of, but a lot of 2.5, there's a lot of 2.5 out there. And yeah,
35:30 and it's using some old version of that plot lib or something or some old version of pandas. And
35:34 they're going to complain, you know, on the tracker or on the, you know, on the issue tracker about that.
35:38 And part of the cultural change that I think we should try to encourage sounds like, okay,
35:44 you're doing this for your job. You need, it's not so great. We are the maintainers, but you have
35:50 a company who depends upon this. Can your company contribute some time, a PR, some fit, like it's got
35:57 to be a two-way street. I think it can't just be, well, you know, one of the things I suspect that
36:01 you also feel at Anaconda Inc. is there are so many companies out there making millions and billions
36:10 of dollars a year on top of free. There's like people working in their free time on some open
36:16 source project that company is basically built upon and they make billions of dollars and contribute back
36:22 nearly zero or zero.
36:24 Yes. I've frequently equipped that I can fit probably the core NumPy pandas maintainers
36:31 in my, no, no, my, okay. So we've gotten a few more now, so they don't all fit my minivan,
36:36 but at one point in time, certainly core NumPy.
36:39 You're going to need one of those longer, like full vans that holds 15 people.
36:43 I may need a 15 person van, but I could, I could probably fit them in the 15 person van.
36:47 You know, Matt Plotlib, which everybody relies on is like just a few people, maybe part-time. There's
36:54 not like one whole FTE on it even.
36:55 Yeah.
36:56 There's projects like Jupyter that are very large, but also underfunded. And there's projects
37:00 that are small and underfunded. And it's extreme. Yes. It's exceptionally tragic.
37:05 Right.
37:06 It's exceptionally tragic.
37:07 Well, and do you know that I think the part of the tragedy to me is like, if it really took
37:11 a thousand people to make Matt Plotlib, 600 people to make Flask, maybe the community can't contribute
37:18 back enough to pay those thousand engineers full-time. But like you said, it's like a van full of people,
37:26 or it's my small car full of people for Flask, right? And click and all those things. The people
37:33 and the companies that use Flask make so much money and depends so heavily upon it that they could easily
37:39 pay those three, four or five people to be full-time on that and be doing really well. Right. But they don't.
37:46 Right. It's just, it's not even asking very much of them, which is what's crazy.
37:49 I'm of two minds on this or not two minds, but I have like two major views on this.
37:53 One of them is that we should look at this as the triumph of software. I mean,
37:58 to sort of just to sort of restate the point you're making, which is that,
38:00 holy crap, one or two or 10 people can build something that is fundamental to
38:09 billions and billions of dollars of global economic activity. That's something to be celebrated,
38:15 right? Because that should free up. Think of how many more thousands of software developers
38:19 don't have to be working on Flask. They can just go and have free time. Not really,
38:23 but you know, in theory, that's how.
38:24 Build something more interesting than just the framework, right? They could build something with
38:28 this result.
38:29 So that's one way to look at it and that we should celebrate where we can. But on the other hand,
38:34 the thing is like, if we can't even somehow come up with the funding for like 10 FTEs for these
38:39 fundamental projects, what's broken? What's broken, right? Because it can't be, it's not,
38:44 it can't be that hard. And so I think there's two ways to look at this. One is that the open source
38:50 community as the, essentially the field of software, I think it's essentially commoditizing out and the
38:57 labor, what open source represents. And this particular thing happening in the Python ecosystem
39:01 is the very vanguard of this transition. It represents essentially the end of labor economics
39:07 for software. And so that going away, we're at that transition. And so it's very hard to think about it
39:15 for companies because companies will allocate budget for software development in a very like
39:21 headcount oriented way, right? And they know what they're getting when they pay for an FTE dev here
39:26 or there or wherever.
39:27 Sure.
39:27 If they just throw money at some open source, what are they getting for it? You know, they know how to,
39:31 they know how to pay money for software. Companies are very good at paying money for software,
39:34 but paying for stuff that they can already get for free. They literally, that is a null value on a
39:40 spreadsheet. They cannot compute that. It is a NAN, right? So my view on this is actually quite simple,
39:46 which is that if open source developers, the people like me who care about the open source ecosystem,
39:51 if we want to sustain the community innovation and that positive abundance mentality that we have in the
39:59 open source ecology, the human ecology of open source has moved to post scarcity, post labor economics.
40:05 If we want to sustain that, then we need to actually drive a new conversation. We need to actually
40:10 provide the tooling and the infrastructure for the companies to think about how to consume this.
40:17 This portion of Talk Python to Me is brought to you by Rollbar. Got a question for you. Have you been
40:22 outsourcing your bug discovery to your users? Have you been making them send you bug reports? You know,
40:27 there's two problems with that. You can't discover all the bugs this way. And some users don't bother
40:32 reporting bugs at all. They just leave sometimes forever. The best software teams practice proactive
40:38 error monitoring. They detect all the errors in their production apps and services in real time and
40:43 debug important errors in minutes or hours, sometimes before users even notice. Teams from companies like
40:49 Twilio, Instacart and CircleCI use Rollbar to do this. With Rollbar, you get a real time feed of all the errors
40:56 so you know exactly what's broken in production. And Rollbar automatically collects all the relevant data and
41:02 metadata you need to debug the errors so you don't have to sift through logs. If you aren't using Rollbar yet,
41:07 they have a special offer for you. And it's really awesome. Sign up and install Rollbar at
41:12 talkpython.fm/Rollbar. And Rollbar will send you a $100 gift card to use at the Open Collective,
41:18 where you can donate to any of the 900 plus projects listed under the Open Source Collective or to the
41:24 Women Who Code organization. Get notified of errors in real time and make a difference in Open Source.
41:29 Visit talkpython.fm/Rollbar today.
41:34 What are some of the key elements?
41:35 One way to do it is you can look at it almost like treat each new... Number one,
41:39 it's something we have to work on ourselves, which is to not make money be a bad word,
41:44 which is still a mindset that pervades many Open Source communities and developers.
41:48 Any affiliation with any kind of money-managing, money-changing organization is seen as essentially...
41:55 It's seen as corrupting sometimes. Yeah, yeah.
41:57 It's corrupting, exactly. So, I mean, we literally had a SciPy mailing list,
42:02 I think a couple of years ago, someone was arguing that we should only allow steering council members
42:07 to be part of universities or part of academia, which they don't have their own agendas.
42:11 And the other people were just like, are you kidding me? Academics don't have agendas anymore.
42:15 So, people like to kid themselves a lot about this kind of stuff. But anyway,
42:19 so I think that the Open Source community needs to, number one, not be allergic to money and treat it as a corrupting influence, right?
42:25 There's companies and ways, business models that are trying to help Open Source and trying to be
42:33 good participants in it. And then there are the corrupting, evil, taking advantage of type
42:37 companies. So, like, it's not black and white, but there are certainly paths forward where
42:42 companies like you guys and others are putting in lots of effort to try to make things better
42:49 legitimately.
42:50 Yeah. And I appreciate that you recognize that. Like, we really have really tried to be good
42:54 citizens in the Open Source community. But I think companies, for a lot of companies, that
42:59 it's like the mind is willing, but the spreadsheets are weak. You know, like, it's still really hard for
43:04 people and proponents and advocates, even within those companies, to, at the end of the day,
43:08 make the budgetary justifications. Because the companies internally don't know how to,
43:12 they don't know how to reason about it.
43:14 Yeah.
43:14 You know? So, I think that's where the Open Source community can try to help. Like,
43:18 number one, one thing we could do is do almost like a Kickstarter style or like, you know,
43:23 I play Warcraft a little bit. And so, it's like world boss, like, takedown. So, before we can
43:28 release any new versions of Library XYZ next year, we've got to get this much money in, right?
43:34 Yeah.
43:34 And people basically just, but they put the money in. But I think that's actually as fun as that
43:39 would be in the Kickstarter model like that, as cool as that would be and as interesting as that
43:43 would be, I think businesses have a hard time just writing checks for donations. So,
43:48 the other thing that I think the Open Source community needs to do, I think the one that's
43:51 more realistic, is to actually form entities that can have a business-to-business conversation
43:57 with the corporate players and understand how to talk to their procurement, talk to their legal and
44:05 everyone else, and basically act as a crossover facility to do the product management so the
44:10 businesses know what they're getting for their money. It's not a charity. You know,
44:13 some things that people may not be aware of is that for a business to write a $10,000 charity check,
44:18 that comes out of a different part of the business a lot of times.
44:22 Even if everyone wants to, for budgetary and for finance and compliance reasons,
44:27 they literally cannot just write a check to some dude, you know, some Open Source hacker in the
44:32 middle of Europe somewhere. So, these are the things that we need to actually put together.
44:36 I think the allergic to money issue, I think that that can be solved with the right examples of Open Source
44:43 companies and companies entering Open Source in positive ways. But I feel like there's some kind of
44:51 structure or something that has to get between the corporations and the Open Source projects,
44:58 where it's like you say, it's not a charity check. It's you pay into this and there's, you get a little
45:05 bit more of something. And I don't know what that is, but there's something like that. Then the companies
45:10 can justify it. They say, look, we depend upon this thing. We pay, you know, 0.01% of our revenue to the
45:18 people that make it work so that our system doesn't go away. And here's what we get for that 0.01%. I don't know
45:22 what that is. It's actually, we don't have to reinvent the wheel here. It happens all the time
45:26 in every other industry. It's an industry consortium. It's an industry consortium. You pay into it. And
45:31 what happens is you get votes on various technical councils and technical boards, and they do the
45:36 product management and the dev management for what the thing should be. In the Python world, we want that
45:42 to, in all cases for a lot of these projects, we want that to still be subordinate to the vision
45:48 of the open innovation volunteer kind of crew. But there's so much housekeeping. There's so much
45:56 issue tracking stuff. There's so much like documentation, management, cleanup, just keeping
46:01 the lights on and the yak shaving. There's so much that goes into a project that these kinds of
46:06 consortium models can fund. And I think Python itself, and I'll just come out on your podcast and I'll just
46:11 say it. I think Python itself badly needs this. Yeah.
46:14 Badly needs an actual consortium like this to be operated in a way that can accept dollars easily.
46:19 That's easy for people to write checks, right? Like we all know this as entrepreneurs, like make
46:23 yourself easy to do business with. The open source community, I would say, has not made itself easy
46:27 to do business with. You got to either hire a core dev. And if you do, that core dev then has to,
46:32 in their own minds, be like, am I wearing my community hat or my employee hat, which is tough on them,
46:37 right? It's very stressful for them. And the open source community, even when we get the dollars,
46:41 we don't make it clear to the people writing the checks what those dollars are buying for them.
46:45 Like if they have a couple of issues that are easy to solve, that really can make a difference for
46:49 them, we don't necessarily prioritize those issues just because they wrote us a check because we don't
46:53 want to feel like we're that, you know, like it's that quid pro quo. So I think that you really need
46:58 some kind of facility in the middle of that access consortium that is able to help businesses steer
47:02 and guide a lot of these maintenance, pretty basic kinds of maintenance things that need to happen
47:07 for projects that would make their lives easier. And that can then funnel a ton of money into a ton
47:13 of margin on that goes into the innovation work and all the forward looking kind of stuff.
47:16 And everyone's happy.
47:18 Yeah. Do you think the PSF could do it?
47:19 I think the PSF could do it. I think that the PSF would be, I don't know if it operates as a
47:24 nonprofit.
47:25 It does. Yeah.
47:26 Yeah. So if it's a nonprofit, I think it'd be very hard for it to do it. It might need to actually
47:31 create like sort of Mozilla Foundation, Mozilla Corporation. I think it would need to create
47:35 some kind of a traditional C corporate or a B Corp, perhaps like a social mission for profit that it
47:42 owns like director seats on and, you know, the chunk of the things. But companies, a lot of times are
47:47 just prohibited from writing checks to 501c3s unless it comes out of their philanthropy group.
47:52 So again, this is that making it easy to do business with kind of thing.
47:55 Yeah. Interesting.
47:56 Absolutely. I think the PSF should spin up a thing like that. And I've been sort of
48:00 quietly advocating for this behind the scenes a little bit. And maybe I'll be more vocal about
48:04 that here this year.
48:05 All right. Well, we can spread a little word on the podcast as we just have.
48:08 It's really interesting. And I think there's absolutely lots of possibilities for business
48:15 models in open source. But I feel like there's actually a 98% gap, like 2% of that is captured.
48:26 98% of it is not because we have these large, but still not huge, like banks in the Midwest that
48:33 contribute nothing. They do no PRs. They don't do anything to that effect, right? They just,
48:38 it's just not in their culture. And like you said, there's no real mechanism for them to
48:43 pay a little and get more and justify that.
48:45 Yes. Yes. And actually some of the open source business models that are emerging now,
48:49 they present challenges of their own. Again, my overriding thesis is that the world of software
48:54 is actually commoditizing pretty quickly. And so people, like if you look at the things that have
49:01 been happening in the last six months, as I would say open source software component vendors,
49:07 like Mongo and Redis and Timescale and others, as they start getting their business eaten by the cloud
49:13 vendors, they're realizing that open source, you know, sounded great. Open core sounded great.
49:18 And then they start losing any future route to revenue. And they've got to actually aggressively
49:23 go to like dual licensing and like deep viral HEPL three kind of stuff. I don't know that open source
49:29 is even the right conversation to have anymore. I think it should be around sustainable community
49:35 innovation and the freedom to experiment, freedom to innovate, freedom to, you know, there's a lot of
49:40 like free as in beer and free as in innovation. But like, the traditional ways we have about talking about the
49:47 source code itself, again, is limited in this paradigm of like code drops. And we're beyond that now.
49:53 Yeah. And you know, you look at the cloud, for example, a lot of these places that they provide you something,
49:58 and you pay on usage, right? You don't buy any software in the cloud, but you have the subscription
50:06 model all over the place, right? And that's, that's starting to really shift the way things are working
50:11 as well. And I feel like the cloud vendors actually have this interesting lock in where they're a little
50:16 bit defended against some of these challenges that are coming up.
50:20 Well, absolutely. There's only like three major cloud vendors of significance in here in the US,
50:25 at least. And all of them are absolutely going for lock in. And they're, you know, ultimately,
50:32 their business model. It's not necessarily I mean, it's a for profit business model, put it that way,
50:36 right? Yeah, the cloud is the new lock in with a lot of those API's. It's interesting. And like this
50:40 MongoDB AWS thing you talked about, like, that's a little bit of it as well, right? But it's pretty
50:45 interesting. Yeah, I think we could probably talk for hours and hours on this, because we're both
50:50 pretty passionate about it. It's awesome. But let me ask you a few more questions before we run out of
50:55 time. Sure. These are all sort of forward looking type things. And one of them is data science from
51:00 you called out the year 2012 to me that if you look at the analytics and the graphs and the usage,
51:05 like there's a huge increase in the derivative of a lot of things around Python at 2012, up till now.
51:13 So five years further out, what do you think data science looks like? Is it still deeply working
51:20 with Python? Is it solving different problems? Where is it going?
51:23 We're going to see data science much more integrated. People have a better sense of what it
51:30 can and can't do by itself rather, right? It's a new discipline that's coming into the business. It's a
51:36 new swim lane. Everyone's trying to figure out how they stand in relation to it. There's a lot of
51:40 political, you know, fighting and a lot of experimentation within a lot of businesses that I see. But at the end of
51:44 the day, I think this idea of doing data exploration, doing model development, and revving models that are
51:52 really critical to the business is the new reality for people. So that's not going away. That's a
51:57 fundamental dynamic that's going to be here. And if you need to go and explore data, you need to go and
52:01 do model development, then you're going to be doing data science full stop, right? There's no,
52:06 like, if you need to basically bring in domain expertise, stats, and coding ability to do that
52:12 well, then you're going to need data scientists intersect. You need all three of those skills,
52:16 you need all three of those. But data scientists are going to find themselves needing to have a much
52:21 better, I think the borders between data, the data science world and the others will clarify better.
52:26 So you'll have data scientists interacting with data engineers, and much better, hopefully much better
52:31 established best practices around how that's supposed to go. And then IT people start accepting that,
52:36 yes, Python is here to stay, we're going to need to deploy real Python stuff. And we need to know a
52:40 little more something about it, right? And so a lot of these little intersectional areas right now
52:45 between data science and other concerns, same thing with BI, people right now, there's literally people
52:49 out there selling point and click visualization tools saying that's data science. And it's like,
52:53 that's not really data science. But they're going to figure that out probably in the next couple
52:58 of years. Hopefully, they get the clue. Yeah, I think that's what I think is going to happen.
53:02 Now, the result of that happening is a gigantic, I think that that clue is going to really start
53:06 hitting home in two years or so. Then the immediate next problem that people have is overall workflow
53:13 management across all of these things. Because everyone's got their favorite tools. Everyone is
53:18 producing things that touch and intersect with everyone else's stuff. How do we get all of this
53:23 stuff managed in one place? And I think that's the challenge doesn't be fit, we're gonna be square in
53:28 the middle of that conversation still. And five years from now, assuming that the Chinese economy
53:50 assuming that the Chinese economy hasn't collapsed, we are going to see some really scary stuff coming
53:56 out of Chinese and the AI innovation happening there. Because they have been, they're completely
54:01 unapologetic about using their entire national population of a billion people as a sandbox for
54:07 trying AI surveillance, sort of cybernetic, the computer controls you kind of things.
54:12 Yeah, the whole social ranking, and all that stuff that's...
54:16 So here's the terrifying thing about that. I'm going to be a little bit of a contrarian on this.
54:20 What if it turns out that their sesame credit system, Rev2, no, Rev1 is scary and crappy.
54:24 Rev2, what if it turns out that they give social sesame credits for their businesses and local
54:29 politicians? Yeah.
54:30 What if they actually start upgrading social sesame credits to being this kind of thing where
54:34 it becomes almost like a, again, back to Warcraft, but like a Warcraft honor reputation system,
54:38 right? And becomes multicolored, it becomes vectorized instead of scalar. They might actually
54:44 innovate a scary, awesome approach that has deep problems because it requires a surveillance state.
54:50 And the Western world might look at that and say, huh, you know, that actually works a lot better
54:54 than, you know, Ivanka Trump, you know, running our fast food joints.
54:58 Yeah.
54:58 Sorry, the White House. So that dates this podcast, by the way. For those who are listening months in
55:03 the future, in case you forgot, just two days ago, the President of the United States served Big
55:07 Macs at the White House. That just, that happened. So this is still fresh in our minds.
55:12 To Clemson, who won the national college football championship. Yeah.
55:16 Yes. It's incredible. Anyway. So the point is that the scary thing about the Chinese AI system
55:21 is that it might work and work really, really well.
55:23 Yeah. Not that it's just pure wrong, but actually there's aspects of it that are amazing
55:28 in its sort of black mirror, electric dreams way.
55:32 Oh yeah. Tell you what, it's going to be pretty amazing. I think the same way that like a lot of the
55:36 Western world is like, oh, well, we already saw where this goes in Orwell, so we're not going to
55:41 go there. Western world has that kind of snottiness about it. I think they're underestimating how good
55:46 it could be and how tempting that goodness can look to technologists, to the capitalists, and to the
55:53 policymakers here. That's really for me as a, as someone fled the communist regime, you know, as a
55:58 child, like that's the scary thing about it.
56:00 That is really an interesting analysis. And certainly I was thinking ethics, data ethics,
56:05 and accountability for data models and AI and ML, right? Like, sorry, you couldn't get the house.
56:12 The AI said no, right? Like, no, no, no. You have to say why the AI said no. Well, we don't know,
56:17 but it's really good. And it said no, you know, like answering that problem is going to be interesting
56:22 too.
56:22 It is. And you know, the, the thing is that already now you get denied, right? And there's already a model
56:28 that tells you why you're denied. And the AI can, this kind of gets back to that same thing with the
56:33 whole black mirror thing and the AI in China, like really, really good AI. It doesn't look like that
56:37 AI, you know? So the really, really good systems, quote unquote, good, the really effective systems
56:43 at partitioning people and spot targeting them, they're going to be dressed up in ways that are
56:48 palatable. Our robot overlords will look like Cylons. They're going to look really human-like.
56:52 This is the scary future, man. I'm not trying to like scare you and scare your listeners.
56:56 I'm just telling you though, like, this is what's coming. And as humans, I'm actually a human. I'm
57:01 not a Cylon as humans, as, you know, tribe human, I think we've got to get better at being human.
57:06 And so that's maybe too philosophical hand wavy, but anyway.
57:10 Yeah. It's really an interesting thing to ponder for sure. All right. So I guess final comment or topic
57:17 just real quickly is I feel like there's been this Python 2, 3 debate, modern Python versus legacy Python,
57:25 as I like to position it. And I feel like the adoption of modern Python in data science is much faster
57:33 than it has been in the general Python space. One, do you think that's true? And then two, why do you think that is?
57:41 One, I think it's true. And two, I think it's because a lot of data science stuff is new and
57:45 legacy data science code tends to age with models. So like a piece of data science code is only as good
57:52 as the model data that it was trained on and models change because the world changes. So there's a built
57:59 in expiration date on any data science model that you've got. So you're not keeping transaction systems
58:04 from 20 years ago live.
58:06 The complexity and the algorithms and the techniques are just not even relevant, right? Like the machine
58:11 learning of five years ago doesn't compete with the machine learning of today. And it's not like
58:15 you're just going to upgrade. It's a totally different thing. You just retrain it on TensorFlow
58:20 or Keras or whatever, right?
58:21 Right. And secondly, this is another sort of important dynamic, which is that the regulatory environment
58:27 around data science hasn't caught up. So it doesn't require you, you know, I was talking to an engineer
58:32 from a software modeling engineer from an airplane company. And he was saying, yeah, the FAA requires
58:39 us to be able to reproduce our computational design models for like decades, for decades.
58:45 Yeah. Wow.
58:46 So, I mean, yeah, because planes actually, if they're well maintained, they fly for a long time,
58:50 right? And if there's a structural failure of a part...
58:53 Right. There's a lot of 737s out there. Yeah.
58:55 Oh, yeah. And so data science just doesn't have that problem yet. And, you know, one of the earliest
58:59 adopters of Python, this is a really interesting dynamic that people may not be aware of, but
59:03 in the mid 2000s, there was a significant uptake of Python in the hedge fund and the finance industry.
59:10 And so that was Python 2, Python 2, 5, 2, 6 around the time. And so that got into a lot of places.
59:18 And finance is actually a pretty regulated area. And so a lot of that code, especially if it starts
59:23 running production finance systems, people need to keep it running, not only because they're...
59:27 Even if you stop using a particular finance model to like score or to do whatever, to price a trade
59:33 and things like that, oftentimes you'll want to go back and do what's called backtesting.
59:37 So you want to run new data against those old models, and you'll want to race them against the
59:43 new models, right? You'll want to run new models on old data and new data on old models. And so that
59:48 kind of backtesting approach, you need to keep that old code running for that purpose as well,
59:52 just from a risk management perspective. So a lot of the finance industries like running
59:56 ahead and adopting Python 2 has sort of gotten them stuck on Python 2 a little bit.
01:00:01 Okay. Interesting. Yeah. So almost a victim of its own success in a way, but in some of these
01:00:09 industries. All right. I guess we're going to have to leave it there because we're out of time. But
01:00:13 like I said, a lot of interesting stuff to talk about. I have to just put it at rest. So before we move on,
01:00:19 though, I'm going to ask you the two questions, always ask it in the show. If you're going to
01:00:23 write some Python code, what editor would you use?
01:00:25 My old go-to is still Vim. But for large code bases, I tend to use PyCharm so I can, you know,
01:00:30 sort of navigate more easily.
01:00:31 Yeah, sure. Makes sense. And then there's many, many packages on PyPI or available on CondoForge.
01:00:39 What do you think one that people maybe haven't heard of, but they should, or you want to recommend?
01:00:43 Is it bad form to pimp? Is it like to pimp your own stuff?
01:00:46 No, you do it. No, no, go ahead.
01:00:48 So I'm really, really excited about a new project that we created called Intake,
01:00:53 which I would encourage people to take a look at it. It's pretty new. We just launched it last year.
01:00:58 Yeah, it looks interesting. I was going to ask you more about it, but we just
01:01:00 have too many topics already. So tell us about it real quick.
01:01:03 So Intake is a data loading abstraction library. So it's basically just load my data,
01:01:09 and it abstracts your data loading stuff into a declarative syntax so that the beginning of your
01:01:14 data science scripts doesn't have a whole bunch of like embedded and brittle SQL calls or pandas
01:01:19 column transformations or things like that. Intake is a way to make it so that your actual
01:01:23 data science or data transformation code is sort of its own code artifact and your data bits are your
01:01:29 data bits. It's kind of a nerdy thing, but we think that it actually addresses that data,
01:01:34 that model reproducibility and code reproducibility problem that data scientists face.
01:01:38 Sounds really useful. Thanks. All right. So final call to action. People are excited about
01:01:42 the Anaconda distribution or maybe getting, making some progress on this open source business model
01:01:48 thing we talked about. What would you say to people?
01:01:50 So I would say that we have AnacondaCon coming up. So if you're actually using Python
01:01:55 in a commercial environment, strongly recommend AnacondaCon. We have a, we try to make a really good
01:02:01 blend of technology and practitioner kind of stuff and workshops there combined with
01:02:07 business perspectives. So it's not like an industry conference like Gartner or Strata.
01:02:12 It's not like a pure one of those things. It's also not a pure like tech community conference,
01:02:16 like Pi data or something like that. So it's, we try to make a mix of those things.
01:02:19 We've gotten really good reviews in the past couple of years. It's our third year doing it.
01:02:23 I'm super excited about it. It's here in Austin in April, April 3rd to 5th.
01:02:26 So that's AnacondaCon.io. And secondly, people are using Anaconda to like it and they're using it in a
01:02:32 business environment. I would recommend they check out Anaconda Enterprise. We are very,
01:02:36 very proud of the product and we have a lot of problems that we solve for people inside business
01:02:40 environments and the business use of Python for deployment, package management.
01:02:44 Yeah. Real quickly, like what, what's the, what do you get from, right? You know,
01:02:47 I talked about the business model should be, you get a little bit more for your money,
01:02:50 not just pure charity, you know, here's a PayPal donate button. What do people get real quick?
01:02:57 So Anaconda Enterprise is, it gives you the ability to have your own managed package repository.
01:03:02 It gives you a way to do secured and governed collaborative notebooks and model deployment.
01:03:07 It works in the cloud. It works on prem. Many of our customers use it across an air gap and very
01:03:12 strictly governed environments. We basically make it so that data scientists and Python practitioners
01:03:18 in business can be as effective with Anaconda as they are at home nights and weekends on their
01:03:22 own laptops. All right. Yeah. That sounds cool. We just clear all the IT hurdles. Yeah,
01:03:25 that's sweet. All right. Well, thanks for all that you've talked about here, Peter. It's been a
01:03:30 super interesting conversation. Thanks for being on the show. Thank you so much for having me. I
01:03:33 really enjoyed it. You bet. Bye. Bye-bye. This has been another episode of Talk Python to Me.
01:03:38 Our guest on this episode was Peter Wang. It's been brought to you by Linode and Rollbar.
01:03:43 Linode is your go-to hosting for whatever you're building with Python. Get four months free at
01:03:49 talkpython.fm/Linode. That's L-I-N-O-D-E. Rollbar takes the pain out of errors. They give you the
01:03:57 context insight you need to quickly locate and fix errors that might have gone unnoticed until users
01:04:02 complain, of course. Track a ridiculous number of errors for free as Talk Python to Me listeners at
01:04:07 talkpython.fm/Rollbar. Want to level up your Python? If you're just getting started, try my Python
01:04:14 Jumpstart by Building 10 Apps course. Or if you're looking for something more advanced, check out our new
01:04:20 async course that digs into all the different types of async programming you can do in Python.
01:04:25 And of course, if you're interested in more than one of these, be sure to check out our everything
01:04:29 bundle. It's like a subscription that never expires. Be sure to subscribe to the show. Open your favorite
01:04:34 podcatcher and search for Python. We should be right at the top. You can also find the iTunes feed
01:04:39 at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on
01:04:45 talkpython.fm. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it.
01:04:51 Now get out there and write some Python code.
01:04:53 Bye.
01:04:53 Bye.
01:04:54 Bye.
01:04:54 Bye.
01:04:54 Bye.
01:04:54 Bye.
01:04:54 Bye bye.