Deploy Your App: Announcing the Talk Python in Production book.

Catching up with the Anaconda distribution

Episode #198, published Sat, Feb 9, 2019, recorded Wed, Jan 16, 2019

It's time to catch up with the Anaconda crew and see what's new in the Anaconda distribution. This edition of Python was created to solve some of the stickier problems of deployment, especially in the data science space. Their usage gives them deep insight into how Python is being used in the enterprise space as well. Which turns out to be a very interesting part of the conversation.

Episode Deep Dive

Guest Introduction and Background

Peter Wang is the CTO and co-founder of Anaconda Inc. He has been a long-time Python developer, starting with physics and science-focused projects before moving into commercial and large-scale data science solutions. Peter and his team have been deeply involved in creating and maintaining the Anaconda Distribution and have championed Python’s adoption in business and enterprise data science settings. In this episode, he shares insights into Anaconda’s evolution, Python packaging, open-source sustainability, and the future of data science in business.

What to Know If You're New to Python

Before diving into packaging, Conda, and enterprise data science considerations, here are a few essentials:

  • Python has a broad standard library but also depends on community packages for specialized features (e.g., data science).
  • Using tools like conda (from the Anaconda ecosystem) or pip (from the Python standard ecosystem) to install packages can simplify or complicate your workflow depending on your goals.
  • Virtual environments (or Conda environments) help keep project dependencies isolated and compatible.

Key Points and Takeaways

  1. Anaconda Distribution: A Purpose-Built Python Distribution The Anaconda Distribution arose to solve key pain points around installing and managing scientific and data-focused Python packages on all major platforms. It ships with common data science libraries pre-compiled for consistency and easy installation. Because it standardizes compilers and dependencies, it eliminates many “it works on my machine” issues.
  2. Why Conda vs. Pip? Conda provides robust dependency resolution and the ability to install non-Python libraries (e.g., R, C libraries) in a cross-platform way. Pip focuses on Python packages only, so complex builds involving C/C++ dependencies can be harder to manage. Conda attempts to ensure everything is packaged consistently, giving you a more “batteries-included” approach for data science.
  3. Conda-Forge and BioConda: Community-Driven Repositories Conda-Forge is a large, community-managed collection of Conda recipes. It has thousands of packages maintained by contributors who ensure compatibility across the ecosystem. BioConda extends that concept for biology and genomics packages, providing specialized builds for life sciences.
  4. Enterprise Adoption of Python Peter observes Python’s deepening roots in businesses of all sizes. Many companies once hesitant about open-source tech have embraced Python for data science, machine learning, and enterprise software systems. This shift challenges organizations to adopt better governance, packaging strategies, and consistent deployment solutions (e.g., Anaconda Enterprise).
  5. Maintaining Open Source Projects at Scale Many core data science tools (NumPy, pandas, Matplotlib, etc.) have relatively small teams behind them. Meanwhile, millions of users and entire companies depend on them. Peter highlights the need for sustainable funding and collaboration models so essential libraries remain well-maintained and aligned with growing enterprise adoption.
  6. Packaging Challenges and Reproducibility Scientific computing often relies on compiled extensions, system libraries, and specific compiler choices. Anaconda’s approach is to unify these configurations, ensuring that once packages are built, they’ll run reliably across target systems. This reproducibility is critical when projects must re-run old models, especially in regulated industries like finance or aerospace.
  7. Transition from Python 2 to Python 3 Data science projects were quicker to move to Python 3 thanks to shorter model lifespans, outdated models get replaced, so code refreshes happen more frequently. However, legacy financial or engineering models under strict regulation may lag behind. Peter notes that the community is steadily marching forward and 3.x has become the standard.
  8. Data Science’s Future: Integration, Model Management, and AI Ethics Peter believes data science is moving into a “post-hype” integration phase, where best practices for collaborating with engineering and IT teams will standardize. Attention will shift from building single models to managing the entire model lifecycle, ethical usage, and bridging data engineering, data privacy, and cloud workflows.
  9. Open Source Business Models: Finding Sustainable Funding A central topic was the mismatch between huge commercial dependence on free software and insufficient resources for maintainers. New forms of consortiums, enterprise-friendly services (like dedicated “pay-for-support” channels), and improved packaging of paid features are some paths that might make open source sustainable at scale.
    • Links and Tools:
      • NumFOCUS (supports open-source scientific projects)
  10. Intake: New Data Loading Abstraction An upcoming tool from the Anaconda ecosystem is Intake, designed to simplify data loading by removing brittle data connections from core analysis scripts. It places data catalog and ingestion logic into a straightforward, declarative layer, helping reproducibility across different environments.

Interesting Quotes and Stories

"I discovered Python on Slashdot around version 1.52, and I just fell in love." -- Peter Wang

"If you need to build a complex data science workflow and not worry about compilers or system libraries, Conda is your friend." -- Peter Wang

"If we can’t come up with funding for ten FTEs for these fundamental Python projects, we have to ask what’s broken in our approach." -- Peter Wang

Key Definitions and Terms

  • Conda: A cross-platform, language-agnostic package manager that handles Python, R, C libraries, and more.
  • Conda-Forge: A community-driven resource of Conda recipes and packages.
  • BioConda: An extension of Conda-Forge for bioinformatics and genomics packages.
  • Anaconda Distribution: A popular Python distribution focused on data science and scientific computing, bundling many key libraries.
  • Packaging: The process of bundling and distributing software and its dependencies so it can be installed consistently.

Learning Resources

Here are a few courses that can help you solidify your Python foundation and explore data science topics further:

Overall Takeaway

The Anaconda Distribution remains one of the cornerstones of Python data science, offering a thoughtful approach to installation, environment management, and package consistency. This conversation illustrates how Python’s success in science and business ties closely to good packaging and open source sustainability. As more enterprises adopt Python, the community must ensure maintainers are supported, novices can learn easily, and powerful AI-driven tools grow responsibly, helping data scientists and developers alike thrive in an ever-evolving field.

Anaconda: anaconda.com
Peter on Twitter: @pwang
JetBrains Survey Results: jetbrains.com
AnacondaCon: anacondacon.io
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Episode Transcript

Collapse transcript

00:00 It's time to catch up with the Anaconda crew and see what's new in the Anaconda distribution.

00:04 This edition of Python was created to solve some of the stickier problems around deployment,

00:08 especially in the data science space. Their usage gives them deep insight into how Python is being

00:13 used in the enterprise space as well. And that turns out to be a very interesting part of the

00:17 conversation. Join me and Peter Wang, CTO at Anaconda Inc., on this episode of Talk Python

00:22 to Me, number 198, recorded January 16th, 2019. Welcome to Talk Python to Me, a weekly podcast

00:42 on Python, the language, the libraries, the ecosystem, and the personalities. This is your

00:47 host, Michael Kennedy. Follow me on Twitter, where I'm @mkennedy. Keep up with the show

00:51 and listen to past episodes at talkpython.fm, and follow the show on Twitter via at Talk Python.

00:56 This episode is sponsored by Linode and Rollbar. Please check out what they're offering during

01:02 their segments. It really helps support the show. Peter, welcome to Talk Python.

01:06 Thank you very much. I'm very happy to be here.

01:07 I'm happy to have you here. It's been a while since we've talked about Anaconda. I had Travis

01:12 Oliphant on the show way back when, but it seems like it's time for a catch up on what you all

01:17 have been up to. Yeah, well, there's been a lot going on. It's definitely, one of the employees

01:21 that's commented that every six months, it feels like a different company. And we do,

01:25 yeah, the space is evolving very quickly. We're trying to just keep up with it.

01:28 So you would say this data science thing is not a fad. It's probably going to be around

01:31 for a while?

01:31 At this point, I think I'm going to go on a limb and say it's probably going to be around

01:34 for a little while.

01:35 Right on. All right, before we get into all that though, let's start with your story.

01:38 How did you get into programming in Python?

01:40 I actually got into programming when I was a young kid and I've been always programming.

01:44 I've actually been programming for almost as long as I've been speaking English.

01:46 I got a PC when I first came here to the United States, so I was very lucky.

01:50 But I actually majored in physics and out of college, I started going to computer programming

01:56 as a profession. And I did a bunch of C++, but I discovered this thing called Python on Slashdot.

02:03 And I think they announced the version 152. And I was like, fine, I'll go take a look at it.

02:08 And I started playing with it and I just fell in love. And so my day job was like getting beat

02:14 up by C++ templates and out of compliance compilers. And at night, I just hack on Python.

02:19 So finally, after a few years of this, I ended up moving to Austin. I got a job doing Python

02:23 as my day job, which was awesome. In like 2004, I started at Enthought. And I did a lot of work

02:29 in the scientific community and doing consulting with Python because I knew the science given my

02:33 math and science background in physics. But I also knew the software principles and software

02:37 engineering. So it was a really fantastic time. And that's basically the long and short of it.

02:40 Yeah, that sounds like a great fit. You know, things just came together, right? You have this math and science

02:44 background and you love Python. You found this job and it all just, like all of those things came together to really put you in the right place.

02:51 They really did. I feel very, very blessed in that way. Now, it was a lot of hard work too,

02:55 but I got very comfortable. And, you know, there's this great quote from Bruce Lee that you must never,

03:00 like not, you must never get comfortable, but there will be plateaus and you can't stay there.

03:04 And so I think towards the end of the 20, the aughts, the 2000s, around 2010, I was starting to see big

03:11 data happening. And I started realizing that Python was getting used for business data analysis more

03:16 than just science and engineering. And that our little cozy scipy community could actually be

03:20 something much bigger. And so I started doing some exploration, exploratory work. I really wanted to

03:26 do like D3 for Python. You had a few of the little things I wanted to scratch, some few other itches.

03:30 And so I started Continuum with Travis in order to address some of the technical gaps that we had in

03:36 the community and the technology stack. And then also to really push a narrative in the technology

03:41 market that yes, Python is good for business use. Yes, it's production ready. Yes, you should use it.

03:47 And it can handle big data just fine. And so we really started pushing that narrative in 2012,

03:53 you know, created num focus, created py data, did all these things. And I think that the results have

03:57 spoken for themselves. I definitely think they are that they have. That's great. In 2012, I do think

04:03 there was a little bit more of a debate of, well, is it safe to use Python for our business critical

04:07 stuff? But I feel like that battle has been really solidly won, especially on the data science front,

04:15 right? There was debates about R, maybe R was the space to be. That's not really where it's at anymore,

04:20 is it?

04:20 No, there was definitely a period of language war sort of stuff going on early on. It's odd, like, you know,

04:25 even then, the discussion about is data science a fad? Is it a fad term? Isn't it just business

04:31 intelligence? Or is this just that big data hype cycle all over again? You know, there's a lot of

04:36 doubters and haters on that term. But as I've talked to more users and managers and stuff,

04:42 at businesses, it's clear that they're thinking about data analysis and data analytics in a very

04:47 different way than they have for like decades. And data science is definitely, definitely here to stay

04:51 because of that.

04:52 Absolutely, absolutely. So maybe give people a sense of what you do day to day so they know where you're

04:57 coming from.

04:58 Well, my day to day consists of my former role as CTO. I run the community innovation and open source

05:05 group here at Anaconda. I actually don't run the product engineering teams. And I work with

05:10 everyone. But my general role is working with the community, helping the various community oriented

05:15 and open source devs that we have champion their projects and work better with the broader community.

05:20 I also do a lot of industry facing technical marketing and evangelism. So a lot of customers

05:25 will have me go and speak at internal data science events they do, things like that. There's actually

05:29 remarkably few people in the Python world that really speak to industry on behalf of Python itself,

05:34 relative to the usage of it. I mean, you'll find no shortage of industry analysts talking about how

05:39 great Java is, or how great these like big data projects are, you know, all these like PR type

05:44 things. There's no one doing that for Python. And so that is actually some of my day job. And beyond

05:49 that, it's just trying to keep up with all the things that are happening in data science, machine

05:53 learning, data engineering, data visualization, AI, all of it.

05:58 On top of the advocacy role, it's a pretty much full time learning thing, right? Because there's so

06:04 much change, right?

06:05 There's so much in every area. I mean, there's all the cloud stuff too. There's edge learning,

06:09 there's data privacy, you name it. Every single area that touches data science is undergoing massive

06:13 change right now.

06:14 That's super exciting, but it's also a bit of a challenge. And I think the Anaconda distribution

06:18 does help some with that. Before we get into the distribution story, though, let's just talk about

06:24 Anaconda Inc. So when I had Travis on the show a couple years ago, it was Continuum that was the

06:32 company and Anaconda was the distribution. But now those are not different anymore, right? It's just

06:37 Anaconda, the company and the distribution.

06:40 We renamed ourselves really out of pragmatism, because we would go to places and we'd introduce

06:47 ourselves as Continuum Analytics. And they're like, oh, yes, you guys, like you got some Python stuff.

06:51 We see that here. Like, who are you guys? And then we say, oh, well, we make Anaconda. And they're like, oh,

06:56 I love Anaconda. I use Anaconda all the time and blah, blah, blah. And so we sort of like, after that started

07:01 happening to us all the time, we sort of figured like, well, maybe we should just call ourselves Anaconda.

07:06 And, you know, one of the things that held that up was for a long time, as we were growing the company and

07:12 growing the distribution, we were afraid that changing the company name would actually spook the community.

07:18 And it's a really, it's been one of these interesting things. Like I have, I have lots to say

07:22 about open source. Let's just put it that way. But it's very hard to play the game of open source,

07:27 honestly, and not still get beat up with FUD about it. And so even though we've open sourced our build

07:32 tools, we've open sourced the recipes, we open source everything from the very beginning,

07:35 there are still people in the community who distrust us because we're a company trying to make a

07:40 sustainable, build sustainable funding for this open source effort. So it's a really,

07:45 that was one of the reasons we actually were reticent to do that name change until finally

07:49 just became a no brainer that we basically had to.

07:51 Yeah. If people keep mistaking you for Anaconda Inc, maybe just say, fine, that is our name.

07:56 Yeah. And we'll just deal with the haters, you know, on a one-off basis, I guess. I don't know.

08:00 Yeah, exactly. I mean, it's not unprecedented, right? 37 Signals, who made Basecamp and,

08:06 you know, sort of founded Ruby on Rails, they eventually renamed themselves just to Basecamp.

08:11 They're like, yep, the one major project, fine, we're just called that, right? I guess it's like

08:15 Microsoft reading themselves Windows, which they're probably very happy they didn't. But,

08:19 you know, in a lot of senses, that makes sense. That's cool. Okay, so there's a broad spectrum

08:25 of folks who listen to the show. Many of them will have experience with data science. Many of them

08:31 will know what the Anaconda distribution is. But maybe just, you know, for the folks who are new or

08:36 have been working somewhere else, tell them, what is this distribution? How is it different

08:40 than the standard CPython? And why did you guys make it?

08:43 I'll try to sum this up for a technical, but not data science necessarily audience, right? The basic

08:49 gist of it is that Anaconda arose out of a failure in the Python ecosystem to address the packaging needs

08:56 for the numerical and computationally like heavyweight packages that are in Python. And so for the same

09:03 reason that Linux distributions exist, very few people build Linux from scratch. For actually

09:08 exactly the same technical reasons, we built the Anaconda distribution, because it's actually really,

09:13 really hard to correctly build all of the underlying components that you need for doing productive data

09:18 science and machine learning. And so the reason it's distribution is because all of the libraries you

09:25 build and the packages, the modules with extension modules that you load up, they need to be compiled

09:30 together, they need to be compiled in a compatible way. And so you need to agree on compiler definitions,

09:35 you need to agree on code generation targets, optimization levels, things like that. And if you

09:41 only ever use pure Python packages, so packages whose code only consists of PY files, then you basically

09:49 never run into a problem. It's only when you start having extension libraries, things that depend on maybe

09:54 system libraries, God forbid you try to cross platforms between Linux and Mac and Windows across

09:59 architectures between ARM and x86, you're completely hosed. And so we, in service to the scientific Python

10:06 community, we built this distribution that was a set of packages and a way of building packages that are

10:11 compatible with each other. So that's what the Anaconda distribution is. It's a bulk distribution with about a

10:16 couple hundred pre made libraries. And we have a package updater in it called Conda that lets you

10:23 then install thousands more that are built by us and built by a large open community that also uses the

10:29 same standards. So that's what Conda and Anaconda are in a nutshell. And it's really one of these like

10:35 packaging war kind of things or packaging, the confusion of Python packaging. We actually tried to approach

10:42 Guido back in the day to help define some standards around this. And he basically gave us a very helpful

10:49 guidance, which is maybe your packaging needs are so exotic, you need to build your own system. So we took

10:53 him at his word and we did it. And consequently, when people use Conda, in a lot of cases, things just work.

10:59 There's still like corner cases and a lot of like little rough spots, especially in terms of pip interop.

11:04 But we're very proud of the work we've done so far. And it's used in production every day by big,

11:08 big companies that people rely on Python for their production workloads. So that's basically Anaconda

11:13 and Conda in a nutshell.

11:14 Okay, well, that's a really good summary. Yeah, when I think of it, the main value is that you get

11:20 pre compiled binary versions of the packages that would otherwise have to be compiled from source when you

11:27 pip install them, right?

11:29 Yes.

11:29 And the other part is the cross package compatibility, because somebody makes one package, and they have an

11:38 interest in making them as best they can or whatever, but they don't really care about integrating and testing

11:43 against all the other open source projects that you may pull into your project that they don't even care or know about,

11:49 right? So this sort of bigger picture compatibility that you look at is pretty cool as well.

11:55 It's actually become quite critical. And I think this is one of the areas that the Python community,

11:58 in the confounding haze of packaging, and half built packaging solutions, that we've not really

12:05 been good at giving guidance to the user community about is that if all you ever need to do is build

12:10 one package for yourself, and you fully control the deployment environment, and the development

12:14 environment, then maybe you can go and do that, right? But if you actually have to work on a team

12:19 with other people, like for example, on web developers, a lot of times, they control the

12:23 server, they choose the packages they bring, and they write the code, and they can just push it out

12:28 to their server. And they're good, right?

12:30 Yeah, and they're good to go. And they can you can do any number of things that you want to, you know,

12:33 what I would what I would liken it to is if you ever do, if you build your own wheel, if you build your

12:38 own native extensions, it's like getting plastic powder or plastic pellets, and making your own mold

12:44 mold of Legos or Lego like things and pouring your own little pieces. And so as long as you're the one

12:49 that controls what they have to plug into, and you're the one that controls all the molds, then you

12:53 don't need any standard definitions of studs or holes or lengths or anything like that, you're good to go.

12:58 But if you ever want to work with other people who have their own molds and their own places and

13:03 studs, they want to put these things on, you've got to come up with a standard definition. And so what

13:08 Anaconda is essentially, it's like a Lego system, we've standardized what the studs are and what the

13:13 holes are. So lots of people can build different kinds of Legos, and they all can plug together.

13:16 And that's kind of the long and the short of it.

13:18 Yeah, very interesting. So some other things that are in play there are you talked about Conda and

13:25 installing the packages that you built, right, the couple hundred or whatever that come with the

13:29 distribution. But then you also said installing the others through this thing called Conda Forge.

13:35 What's Conda Forge?

13:35 Well, Conda Forge is a community of people who I would say out of a masochistic charity to the

13:42 community. They take on the job of maintaining build scripts and recipes that take upstream

13:48 software and make it so it's actually buildable in a reproducible way and that it works with other

13:54 things. So it's a community of package builders and they have several hundred contributors and

13:59 they've built thousands of packages. We ourselves build about a thousand, although only 200 are built

14:04 into the big Anaconda installer download. But the Conda Forge community goes even beyond that and

14:09 builds several thousand. And that's what Conda Forge is.

14:11 Yeah. Interesting. So people are like, you know, it's really painful to build this package,

14:16 but only one of us should ever suffer and feel that once. And we'll do that on behalf of the

14:21 community. I'll take that on for this one package.

14:23 Yeah, basically. I mean, you know, the real challenge is it's one of those things in life

14:27 where it's almost worse that it's easy to do a bad job. I don't know that we have a term for this

14:32 in English. Maybe there's a long German word for it. But it's like the same thing with the coding

14:36 principles of like, if something is broken, you want it to break loudly and fail loudly,

14:40 right? You don't want it to make a half effort. Sometimes it kind of works sometimes. And so with,

14:45 but building package is the same thing. Most people can kind of get a build working for most things,

14:51 but does it work well? Will they ever be able to do it again? Like it doesn't work with anything else.

14:57 None of those things, you know, it takes a lot of work to make a good package build. So,

15:01 well, that speaks to the reproducibility side of things. And I know in data science and

15:06 scientists using data science tools, that reproducibility is a super important aspect.

15:11 And I guess the first step is I can run the software, which means I can build the packages

15:16 and install them.

15:17 Right. And that is really what we think that providing pre-built binaries and then having

15:22 good provenance of the build system itself. That's really some of the only ways you can really

15:27 honestly, like not kidding yourself, have reproducibility. I think some people think

15:32 that Docker somehow saves them, but it really doesn't. So it's kind of a struggle right now,

15:38 honestly, because there's so many moving pieces. There's a lot of confusion in that space, but I do.

15:42 Yes, I do agree with you that Conda packages used properly can absolutely be a great way to ensure

15:47 reproducibility for data science.

15:49 Yeah. Well, it's probably better than saying, well, if you want to install this package,

15:54 you're going to need to have the Visual Studio 2008 compiler set up correctly on your machine

15:59 in 2025 or whatever, right? When it's no longer compatible with the Windows or who knows what,

16:05 right?

16:05 Yeah. We're going to have to, like, one of the reasons I think that our team,

16:08 the Conda and Anaconda team are happy to move away from Python 2 is because the dependency on that

16:13 compiler. Someday when we finally put Python 2 to rest, I'm probably going to try to eBay a bunch of,

16:19 like, boxes of those CDs just so they can break them out of, you know, sort of like a cleansing

16:24 bonfire or something. I don't know. Maybe you shouldn't burn CDs. That's bad, actually.

16:28 Yeah, but you could have some sort of ceremony with them for sure.

16:32 Yeah.

16:33 I think the new Python 3.7, it uses MSBuild. Is that right?

16:38 You know, I'm not sure on the details of that, but I think that there have been significant

16:42 improvements. And, you know, the Python folks who work at Microsoft have worked really hard

16:49 to improve the compiler situation there for Python. I think it's much better now with Python 3 and in

16:53 the later releases of Windows. It's just we have, you know, very old Python, very old Windows that

16:59 still are deployed that we have to keep those users going. So that's where almost all the pain is.

17:04 I can imagine. Yeah, I just had Steve Dower from Microsoft on the show, and he's in charge of the

17:08 installer and stuff there. And he's doing some really, really cool stuff to make it more accessible

17:12 on Windows. And it's easy to go to conferences and forget how important Windows actually is,

17:19 right? You look around, it looks like everyone has a Mac. There's a few people running Linux.

17:23 That's pretty much what you see at the conferences, right? But that's not what the actual consumption

17:28 out in the world is, is it?

17:30 No, that's not at all reflective of the of even the United States. And then you go to the broader

17:35 world. It's a lot of Windows. It's a lot of Windows, a lot of Linux, too. But yeah, I think this is one of

17:42 the structural problems that faces the open source community is that when you're small, it's easy to do

17:47 product management, because it's like you and your buddies. But once you get bigger, you have to actually

17:52 intentionally go and try to pull in information from your users. And I think that's the Python, that's

17:57 actually, I think, a structural challenge for the Python community at this point in time.

18:00 When we're talking about Conda Forge and things like that, something I had not heard of before,

18:05 but I saw that you're running is something called BioConda. Now, it sounds like it might have to do with

18:11 biology and data science around biology, but that's all I can discern from it. Tell us about that.

18:16 That's new to me.

18:17 So BioConda is actually not one of our projects. And oh, I should have said this earlier with Conda Forge.

18:22 BioConda, Conda Forge, and various other sort of groups, they use our Anaconda Cloud package hosting

18:29 infrastructure to support their community. Because with the Conda package installer, it's easy to give

18:35 it a namespace flag, basically a channel name, and then it will go and download packages only from that

18:40 channel on Anaconda Cloud. So these represent, Conda Forge and BioConda represent different communities

18:46 that are using the Conda packaging tool, but they may have set slightly different standards or included

18:50 certain other standards in their build system protocols and standards. So all these packages

18:55 work together. So yes, BioConda is for the biology, genomics sort of community.

19:00 Yeah.

19:01 They have very specialized, well, specialized is maybe a euphemism, but there's a lot of specialized

19:05 software needs in the biology community. It's very R-centric. There's a lot of, depending on what

19:11 you're doing in that domain, there's a lot of PERL sometimes.

19:13 So...

19:14 Yeah, interesting. We'll leave that there.

19:17 Are there other ones? Is there like a ChemConda or things like that?

19:20 No. So there's actually... Yeah. So I think Bio... I'm going to kick myself later,

19:24 probably, as I forget some. But there are major research disciplines and communities that do use

19:29 Conda quite a bit. So I think the astronomy research community has taken on Python and embraced Python

19:34 a lot. They use Conda as a way to get nightly builds and dev builds and just really get easy

19:39 deployments, right, of their complex software. One of the things that Conda does well,

19:43 I should have said this earlier, it's not just a Python packaging tool. It's a sort of a userland

19:48 software packaging tool. So we package up R, Perl, Python, C, C++, Fortran, Java, Scala,

19:54 Ruby, Node, you name it. We really are almost like a portable userland RPM kind of thing.

20:01 And so that allows for these communities that have a lot of scientific engineering code written in not

20:07 Python, sometimes not even C or C++. We can package all those things up together, move

20:12 these collections of packages around.

20:13 Yeah, that's pretty interesting. That takes the challenge of packaging and sort of

20:18 magnifies it extremely, right? Multiplies it combinatorially.

20:22 Oh, yeah. Oh, yeah. It definitely gets pretty complex.

20:28 This portion of Talk Python To Me is brought to you by Linode. Are you looking for hosting that's fast,

20:33 simple, and incredibly affordable? Well, look past that bookstore and check out Linode at

20:38 talkpython.fm/Linode. That's L-I-N-O-D-E. Plans start at just $5 a month for a dedicated server

20:45 with a gig of RAM. They have 10 data centers across the globe. So no matter where you are or where your

20:50 users are, there's a data center for you. Whether you want to run a Python web app, host a private Git server,

20:55 or just a file server, you'll get native SSDs on all the machines, a newly upgraded 200 gigabit network,

21:02 24-7 friendly support, even on holidays, and a seven-day money-back guarantee. Need a little help

21:07 with your infrastructure? They even offer professional services to help you with architecture, migrations,

21:12 and more. Do you want a dedicated server for free for the next four months? Just visit talkpython.fm

21:18 slash Linode. So another thing that looks like it's doing really well is Anaconda Cloud. And so

21:26 this is a place where like data scientists can share their work and their packages and things like that.

21:31 Is that right? Yes. So right now, Anaconda Cloud is primarily, I think, used as a package hosting

21:35 environment. And a lot of developers in the data science ecosystem use it as a way to publish

21:40 nightlies or dev builds. Many of the projects, the key projects, they give us a heads up when they're

21:45 about to cut a new release so that they can push, make sure that they can announce the Conda package

21:50 at the same time they announce the release of the, you know, cutting new version of the software. So

21:54 it's very nice of them. Yeah. So how's that work alongside as well as moving differently than just

22:00 putting on PyPI? It gets pretty complex. So number one, there's channel support. So we basically have

22:07 individual developers can have their own channel and those packages, you know, their users can just

22:11 download packages from just that channel and not sort of a single global namespace, right?

22:17 Another really important thing is that there's not just one build. So Conda as a packaging system

22:22 has much deeper and richer metadata about the build environment and what it expects of the runtime

22:28 environment. So I can build a package that the same upstream software, I can build different versions

22:33 that are optimized for different levels of your hardware, like whether or not you have GPUs,

22:37 whether or not you want, you have an advanced Intel chip or a relatively basic chip, I can push all of

22:42 that stuff in. And maybe using this version of a compiler or that version of a compiler, like Clang

22:48 versus GNU GCC, you know, these things actually make material difference in whether or not the package

22:53 will work. That level of resolution and that ability to feature flag and select is not available on PyPI

22:59 as far as I'm aware. And again, it's just, you know, even if one package is available, if you use

23:04 pip to install PyPI, pip aggressively goes and tries to build other things from source, right? And if it

23:09 doesn't, it sort of has a very, it doesn't do an a priori solve what you need, it sort of grabs things

23:14 as they go. And so you can end up with very much the incorrect packages coming down, you can end up

23:19 trying to build something from source that maybe build successfully. But again, that's not what you

23:22 wanted. You want the pre build, right?

23:24 Right, with different settings, different compiler. Okay, that's the primary difference.

23:27 It is frustrating periodically that you can say, here's a bunch of things I need to install

23:32 on pip, you know, pip install these things. And one of them will have a requirement that the version

23:38 of one part is no larger than such and such. And yet it'll go grab, you know, depending on the order

23:44 once you specify it, it may grab the wrong one, you know, and just install that. And then the other

23:50 package is incompatible. Like there's weird little cases like that you can get into all the time,

23:55 right? Because it's actually, this is one of those areas of software development that for most people,

24:00 it's not a fun and sexy area to think about. But it's a deeply critical thing. When we rely on open

24:05 source software is to actually understand what does the dependency matrix look like. And there's no free

24:10 lunch, you know, if you do it in kind of this relatively naive way, like what pip does, then you

24:15 can easily end up in a corner, and things are incompatible. If you try to do it, what we do,

24:19 which is have very explicit and curated metadata about versions, and you do an a priori solve,

24:24 well, people complain the solve takes a long time, which it can. So there's really no free lunch on

24:30 that. I think one of the challenges that we actually have is that the metadata itself can be wrong. And

24:37 we found that all over the place. So packages think they will declare they're compatible with this

24:42 version or that version, and they're actually not. And so we have to actually patch what the upstream

24:46 declarations are. So again, it gets subtle and detailed. There's just a lot of like muck in this

24:51 area that we have to deal with. Yeah, it sounds a little bit like, these are the problems that you

24:56 can address and then learn about. If your job is to coordinate a whole bunch of packages that don't

25:02 interact intentionally with each other, right? They just want to make their project,

25:06 something that you can ship and install and use. And that's fine, right? But at this,

25:12 this interaction across them is where it gets tricky.

25:15 There's absolutely a tragedy of the commons. Like with the way I've, the metaphor I've used in the

25:19 past is that every developer, you know, open source maintainers, bless their hearts. They are way,

25:24 they're doing a thankless job a lot of times anyway, and they're way burned out and stressed.

25:28 But they're really solving for it. Does my vehicle work in my driveway? You know, can it get out of my

25:33 driveway and drive into my other maintainers driveway down the street? And if that works, they're good to go

25:38 a lot of times. And when everyone, one, every of the thousand developers in the ecosystem do this,

25:44 you'll end up with a bunch of cars squashing all over each other in the, in the, in the highways and

25:49 the freeways, because they're not thinking about that integration problem for their end users.

25:52 And the end users, a lot of times in data science, they're not sophisticated software developers.

25:56 They have no ability to solve this problem for themselves.

25:58 They're at the very edge of struggling to write a 10 line script, not understand the complexity of like

26:05 TensorFlow dependencies or something like that.

26:07 Exactly. Exactly.

26:09 So one thing that you all did recently, that seems to be a trend is you switch from the major minor

26:15 versioning scheme to calendar based scheme. And I think this is an interesting thing, especially

26:20 around open source, because we've had, you know, Mamuta Shemi created this site called Zerover,

26:26 sort of make fun of all the projects that have been around for 10, 15 years with, you know,

26:32 50 or a hundred releases, but are like 0.1 point 17, you know, some point, you know,

26:38 like really small versions. And it seems like one of the fixes is to say, well, let's move towards

26:44 something that has more to do with, I can look at the version and I can tell you without deeply

26:50 knowing that software, whether that's a new version, an old version, a medium aged version,

26:55 right? Like if I told you request was 2.1.4, is that new? Is that out of date? I don't know.

27:02 Right. But if you use this, this new style, it's pretty obvious. Like, what was the thinking there?

27:07 It's a community convention. It definitely makes it, it's for that user affordance that you can

27:11 sort of look at it and know. And also, you know, we set this expectation that we will release at a

27:16 regular cadence and it's for our own internal documentation and everything else. Everyone

27:19 just is able to collaborate more easily around that. But I think the zero ver thing, I mean,

27:24 I love Mamuta and I think it was a hilarious thing, you know, in a community here where we have

27:27 SciPy and iPython or, you know, Jupyter and other things, pandas, you know, zero dot, whatever,

27:33 or I guess it's not quite zero dot anymore, but like SciPy for sure. These things, there's actually

27:38 something we can laugh at all we want to, but there's a thing there that the author is trying

27:43 to say, or the maintainer is trying to say, which is, it's not quite ready yet.

27:46 You know, I'll call it 1.0 when I'm good and ready and I'm not ready yet. It might not be for 20 years.

27:53 And so, of course, that's also kind of a silly position to take with literally millions of people

27:57 and their production code depend on your software.

27:59 I think they're not saying that it's ready. I think what they're, they're thinking of to say

28:04 when it goes to 1.0 a lot of times is it's done and software is rarely done.

28:10 Well, software is done. The instance it's released, at least that version of it, right?

28:13 I think this is where we as an industry actually have to get, we have to up-level our thinking

28:18 about this. And we got to stop thinking about software as artifacts, hardballs of code that

28:25 are static. And we actually have to start thinking about this from a flow perspective, that we are

28:30 looking at flows of projects. And there's a covenant that is established in a relationship

28:36 between the user of one of these flows and the people who originate those flows.

28:41 And I think, you know, there's a really interesting thing I learned years ago about

28:45 aerodynamics. And basically that when planes move less than the speed of sound, you can reason

28:52 about aerodynamics somewhat similarly to water and water flow, right?

28:55 But once you break the sound barrier, the thing that actually causes you the greatest amount

29:00 of pressure on your airframe and things like that, you actually have to reason about the change

29:04 in cross-sectional area of the airplane as it moves through the air.

29:08 So it's almost more like streams of thick rope and you're shoving rope aside.

29:14 So you move from this particle flow way to looking at actual flows.

29:19 And so similarly with software, I think we've got to stop thinking about this as being just a code drop,

29:25 right? And maintainers as people who go and dump out a bunch of code and actually look at a relationship

29:31 with projects. And this gets to like sustainability. This gets to, you know,

29:35 versioning and what's what, what is the promise in a version number, all of that stuff. It's actually

29:40 deeply involved. I don't know that the software industry has really started to learn how to consume

29:46 like the enterprise consumers of open source. I don't know that their internal practices have

29:50 really caught up with thinking about it that way.

29:52 Yeah. And that's kind of why I was bringing up the versioning a little more deeply because

29:56 I think the folks that spend their time all day in open source, they know that Flask, even though it had

30:04 some small version number recently moved to 1.0, but it had some small version number, but it's really

30:10 used a lot and it's been around a lot. So it's fine. Right. But the corporate groups, the enterprise groups,

30:17 they see that as a flag of like, that's test software. We're not ready to like make our bank

30:24 run on test software. Is that the feeling that you got by interacting with, because you, you touch both

30:29 open source and enterprise groups more than a lot of folks, I would suspect.

30:34 Yes, absolutely. We, we are a B2B software company. That's where the bulk of our revenue comes from.

30:39 And absolutely. We suffered, we suffered mildly for that. You know, we have to basically go and

30:44 talk to procurement and compliance and it people that are swimming, you know, they're up to their

30:48 ears in software. They look at a spreadsheet. We come in with our software, our enterprise software

30:53 and say, well, you know, here's the open source things that are in the manifest. And they look at

30:57 this thing and they're like, what is this? This is a pile of garbage. It's all zero dot, whatever.

31:01 Right. And it's like, yeah, but that runs Instagram, you know, like that literally runs

31:05 Dropbox. So like, what are you complaining? You don't really want to get into that.

31:09 Once you have that argument with an IT guy, you've already lost.

31:11 Right. You're, you're a small insurance company with a hundred thousand customers.

31:15 You're not running, you know, YouTube with a million requests per second. That's using similar

31:21 software, right? It's, but it's the mentality, right?

31:23 Yeah. And you know, a lot of, a lot of going into any kind of, I would say that over the last,

31:29 you know, five or six years, I've had to do a lot of adulting. And one of the parts of adulting

31:33 up from just being a geek, like, you know, code nerd kind of guy to being able to actually have

31:38 customer conversations is actually having quite a bit of empathy for the customer.

31:41 Right. And from their perspective, yeah, they are just a regional bank with a few hundred

31:45 thousand customers. They don't have the budget of alphabet to write to throw at a SRE team

31:51 and a whole dev team and all that stuff. So their approaches to understanding risk and risk

31:55 mitigation from the thousands of vendors that want to sell them software. Maybe it's the most

31:59 practical, you know, I'm not, again, I'm not defending it, but I'm just saying one could come

32:03 to a point of empathy, right? With their approach.

32:04 That's a really good point. I do totally agree. It is exactly because they're small,

32:09 they can't hire the fresh new hottest software engineers that would rather be in Silicon Valley

32:15 or Austin or, you know, Portland or wherever, right? Like they just don't even have the ability

32:21 to determine whether or not what you're saying is true in a lot of, a lot of cases, right? It's like,

32:26 they just, you know, exactly.

32:28 We just rather use Microsoft. We know that they give us this SLA and this agreement and

32:32 we're just good, right? There's one way to make websites, use ASP.net. We're good. Just use,

32:37 you know, something else supported like that, right? And it's, it's a challenge that they

32:41 obviously want to use these new tools and powerful tools, especially in data science,

32:45 right? But they've, they've got a different culture and way of describing software being ready.

32:50 You know, and we can laugh all we want to about like these compliance guys, like beating us up

32:54 for our, you know, scipy, o. whatever. But on the flip side, you know, how many of our,

33:00 our credit card reports and our gas bills come from, yeah, basically some like little ASP app or some,

33:06 you know, access database, God forbid with a bunch of VBA macros, right? That runs the world. So

33:10 how elite are we really?

33:12 That's an interesting point. Yeah. It's definitely worth thinking about. So in a broader sense though,

33:17 I feel like Python is making its way into this enterprise and a major corporation space. I know

33:25 it's increasingly being used for a lot of work, not just data science, but, you know, other types of

33:31 software as well. How do you see it? How do you see the world with your inside view you got?

33:36 Well, I think that's absolutely right. And I think that the Python community may not survive that

33:41 adoption. Interesting. What do you mean by that?

33:43 Not Python, the language, but the Python community. What I mean by that is that, you know, I've talked

33:48 to quite a few like maintainers of some popular projects and they've all reflected to me that

33:53 last couple of years as Python has gone, Python adoption just shot through the roof. I think some

33:58 of it is our pushes on data science and things like that. Others are, you know, this rapid rise of

34:04 deep learning. You know, many things have contributed to this, but ultimately Python is now one of the

34:09 most popular languages on the planet. People are getting jobs in Python and they're using Python

34:14 to do their jobs. And what we're seeing is this transition in the expectation of like, hey, man,

34:21 this is just my nine to five. Like this is a tool that I'm supposed to use to do my job.

34:25 And this tool sucks right now. So I'm going to get on your GitHub and I'm going to give you a bunch

34:28 of grief about it because this is your freaking tool. You know, my, like my employer, I got to feed

34:34 my family. My employer tells me how to use this tool. It's a piece of crap. And so that is,

34:39 that's what I said. I think the Python community might not survive that adoption transition unless

34:44 it intentionally really works hard to drive a positive, like to drive some values into the

34:52 newcomers.

34:52 So maybe that person that comes and complains because, well, I used to download my stuff from

34:58 Microsoft.com. Now I get it from Python.org, but this thing sucks. So I'm going to go back and just

35:03 complain about it as if, you know, there's a commercial entity on the other side whose job

35:08 it is to make the SLA legit.

35:11 Right. Right. But more likely, more likely, actually, they picked up, they inherited some

35:15 piece of crap, three-year-old Python code from some guy who didn't know what he was doing.

35:19 Written in Python 2.5 or something. Yeah.

35:21 Oh, absolutely. It'll be, it'll be 2.5. I think there's a couple of 2.4 things running

35:25 around that I'm aware of, but a lot of 2.5, there's a lot of 2.5 out there. And yeah,

35:30 and it's using some old version of that plot lib or something or some old version of pandas. And

35:34 they're going to complain, you know, on the tracker or on the, you know, on the issue tracker about that.

35:38 And part of the cultural change that I think we should try to encourage sounds like, okay,

35:44 you're doing this for your job. You need, it's not so great. We are the maintainers, but you have

35:50 a company who depends upon this. Can your company contribute some time, a PR, some fit, like it's got

35:57 to be a two-way street. I think it can't just be, well, you know, one of the things I suspect that

36:01 you also feel at Anaconda Inc. is there are so many companies out there making millions and billions

36:10 of dollars a year on top of free. There's like people working in their free time on some open

36:16 source project that company is basically built upon and they make billions of dollars and contribute back

36:22 nearly zero or zero.

36:24 Yes. I've frequently equipped that I can fit probably the core NumPy pandas maintainers

36:31 in my, no, no, my, okay. So we've gotten a few more now, so they don't all fit my minivan,

36:36 but at one point in time, certainly core NumPy.

36:39 You're going to need one of those longer, like full vans that holds 15 people.

36:43 I may need a 15 person van, but I could, I could probably fit them in the 15 person van.

36:47 You know, Matt Plotlib, which everybody relies on is like just a few people, maybe part-time. There's

36:54 not like one whole FTE on it even.

36:55 Yeah.

36:56 There's projects like Jupyter that are very large, but also underfunded. And there's projects

37:00 that are small and underfunded. And it's extreme. Yes. It's exceptionally tragic.

37:05 Right.

37:06 It's exceptionally tragic.

37:07 Well, and do you know that I think the part of the tragedy to me is like, if it really took

37:11 a thousand people to make Matt Plotlib, 600 people to make Flask, maybe the community can't contribute

37:18 back enough to pay those thousand engineers full-time. But like you said, it's like a van full of people,

37:26 or it's my small car full of people for Flask, right? And click and all those things. The people

37:33 and the companies that use Flask make so much money and depends so heavily upon it that they could easily

37:39 pay those three, four or five people to be full-time on that and be doing really well. Right. But they don't.

37:46 Right. It's just, it's not even asking very much of them, which is what's crazy.

37:49 I'm of two minds on this or not two minds, but I have like two major views on this.

37:53 One of them is that we should look at this as the triumph of software. I mean,

37:58 to sort of just to sort of restate the point you're making, which is that,

38:00 holy crap, one or two or 10 people can build something that is fundamental to

38:09 billions and billions of dollars of global economic activity. That's something to be celebrated,

38:15 right? Because that should free up. Think of how many more thousands of software developers

38:19 don't have to be working on Flask. They can just go and have free time. Not really,

38:23 but you know, in theory, that's how.

38:24 Build something more interesting than just the framework, right? They could build something with

38:28 this result.

38:29 So that's one way to look at it and that we should celebrate where we can. But on the other hand,

38:34 the thing is like, if we can't even somehow come up with the funding for like 10 FTEs for these

38:39 fundamental projects, what's broken? What's broken, right? Because it can't be, it's not,

38:44 it can't be that hard. And so I think there's two ways to look at this. One is that the open source

38:50 community as the, essentially the field of software, I think it's essentially commoditizing out and the

38:57 labor, what open source represents. And this particular thing happening in the Python ecosystem

39:01 is the very vanguard of this transition. It represents essentially the end of labor economics

39:07 for software. And so that going away, we're at that transition. And so it's very hard to think about it

39:15 for companies because companies will allocate budget for software development in a very like

39:21 headcount oriented way, right? And they know what they're getting when they pay for an FTE dev here

39:26 or there or wherever.

39:27 Sure.

39:27 If they just throw money at some open source, what are they getting for it? You know, they know how to,

39:31 they know how to pay money for software. Companies are very good at paying money for software,

39:34 but paying for stuff that they can already get for free. They literally, that is a null value on a

39:40 spreadsheet. They cannot compute that. It is a NAN, right? So my view on this is actually quite simple,

39:46 which is that if open source developers, the people like me who care about the open source ecosystem,

39:51 if we want to sustain the community innovation and that positive abundance mentality that we have in the

39:59 open source ecology, the human ecology of open source has moved to post scarcity, post labor economics.

40:05 If we want to sustain that, then we need to actually drive a new conversation. We need to actually

40:10 provide the tooling and the infrastructure for the companies to think about how to consume this.

40:17 This portion of Talk Python to Me is brought to you by Rollbar. Got a question for you. Have you been

40:22 outsourcing your bug discovery to your users? Have you been making them send you bug reports? You know,

40:27 there's two problems with that. You can't discover all the bugs this way. And some users don't bother

40:32 reporting bugs at all. They just leave sometimes forever. The best software teams practice proactive

40:38 error monitoring. They detect all the errors in their production apps and services in real time and

40:43 debug important errors in minutes or hours, sometimes before users even notice. Teams from companies like

40:49 Twilio, Instacart and CircleCI use Rollbar to do this. With Rollbar, you get a real time feed of all the errors

40:56 so you know exactly what's broken in production. And Rollbar automatically collects all the relevant data and

41:02 metadata you need to debug the errors so you don't have to sift through logs. If you aren't using Rollbar yet,

41:07 they have a special offer for you. And it's really awesome. Sign up and install Rollbar at

41:12 talkpython.fm/Rollbar. And Rollbar will send you a $100 gift card to use at the Open Collective,

41:18 where you can donate to any of the 900 plus projects listed under the Open Source Collective or to the

41:24 Women Who Code organization. Get notified of errors in real time and make a difference in Open Source.

41:29 Visit talkpython.fm/Rollbar today.

41:34 What are some of the key elements?

41:35 One way to do it is you can look at it almost like treat each new... Number one,

41:39 it's something we have to work on ourselves, which is to not make money be a bad word,

41:44 which is still a mindset that pervades many Open Source communities and developers.

41:48 Any affiliation with any kind of money-managing, money-changing organization is seen as essentially...

41:55 It's seen as corrupting sometimes. Yeah, yeah.

41:57 It's corrupting, exactly. So, I mean, we literally had a SciPy mailing list,

42:02 I think a couple of years ago, someone was arguing that we should only allow steering council members

42:07 to be part of universities or part of academia, which they don't have their own agendas.

42:11 And the other people were just like, are you kidding me? Academics don't have agendas anymore.

42:15 So, people like to kid themselves a lot about this kind of stuff. But anyway,

42:19 so I think that the Open Source community needs to, number one, not be allergic to money and treat it as a corrupting influence, right?

42:25 There's companies and ways, business models that are trying to help Open Source and trying to be

42:33 good participants in it. And then there are the corrupting, evil, taking advantage of type

42:37 companies. So, like, it's not black and white, but there are certainly paths forward where

42:42 companies like you guys and others are putting in lots of effort to try to make things better

42:49 legitimately.

42:50 Yeah. And I appreciate that you recognize that. Like, we really have really tried to be good

42:54 citizens in the Open Source community. But I think companies, for a lot of companies, that

42:59 it's like the mind is willing, but the spreadsheets are weak. You know, like, it's still really hard for

43:04 people and proponents and advocates, even within those companies, to, at the end of the day,

43:08 make the budgetary justifications. Because the companies internally don't know how to,

43:12 they don't know how to reason about it.

43:14 Yeah.

43:14 You know? So, I think that's where the Open Source community can try to help. Like,

43:18 number one, one thing we could do is do almost like a Kickstarter style or like, you know,

43:23 I play Warcraft a little bit. And so, it's like world boss, like, takedown. So, before we can

43:28 release any new versions of Library XYZ next year, we've got to get this much money in, right?

43:34 Yeah.

43:34 And people basically just, but they put the money in. But I think that's actually as fun as that

43:39 would be in the Kickstarter model like that, as cool as that would be and as interesting as that

43:43 would be, I think businesses have a hard time just writing checks for donations. So,

43:48 the other thing that I think the Open Source community needs to do, I think the one that's

43:51 more realistic, is to actually form entities that can have a business-to-business conversation

43:57 with the corporate players and understand how to talk to their procurement, talk to their legal and

44:05 everyone else, and basically act as a crossover facility to do the product management so the

44:10 businesses know what they're getting for their money. It's not a charity. You know,

44:13 some things that people may not be aware of is that for a business to write a $10,000 charity check,

44:18 that comes out of a different part of the business a lot of times.

44:22 Even if everyone wants to, for budgetary and for finance and compliance reasons,

44:27 they literally cannot just write a check to some dude, you know, some Open Source hacker in the

44:32 middle of Europe somewhere. So, these are the things that we need to actually put together.

44:36 I think the allergic to money issue, I think that that can be solved with the right examples of Open Source

44:43 companies and companies entering Open Source in positive ways. But I feel like there's some kind of

44:51 structure or something that has to get between the corporations and the Open Source projects,

44:58 where it's like you say, it's not a charity check. It's you pay into this and there's, you get a little

45:05 bit more of something. And I don't know what that is, but there's something like that. Then the companies

45:10 can justify it. They say, look, we depend upon this thing. We pay, you know, 0.01% of our revenue to the

45:18 people that make it work so that our system doesn't go away. And here's what we get for that 0.01%. I don't know

45:22 what that is. It's actually, we don't have to reinvent the wheel here. It happens all the time

45:26 in every other industry. It's an industry consortium. It's an industry consortium. You pay into it. And

45:31 what happens is you get votes on various technical councils and technical boards, and they do the

45:36 product management and the dev management for what the thing should be. In the Python world, we want that

45:42 to, in all cases for a lot of these projects, we want that to still be subordinate to the vision

45:48 of the open innovation volunteer kind of crew. But there's so much housekeeping. There's so much

45:56 issue tracking stuff. There's so much like documentation, management, cleanup, just keeping

46:01 the lights on and the yak shaving. There's so much that goes into a project that these kinds of

46:06 consortium models can fund. And I think Python itself, and I'll just come out on your podcast and I'll just

46:11 say it. I think Python itself badly needs this. Yeah.

46:14 Badly needs an actual consortium like this to be operated in a way that can accept dollars easily.

46:19 That's easy for people to write checks, right? Like we all know this as entrepreneurs, like make

46:23 yourself easy to do business with. The open source community, I would say, has not made itself easy

46:27 to do business with. You got to either hire a core dev. And if you do, that core dev then has to,

46:32 in their own minds, be like, am I wearing my community hat or my employee hat, which is tough on them,

46:37 right? It's very stressful for them. And the open source community, even when we get the dollars,

46:41 we don't make it clear to the people writing the checks what those dollars are buying for them.

46:45 Like if they have a couple of issues that are easy to solve, that really can make a difference for

46:49 them, we don't necessarily prioritize those issues just because they wrote us a check because we don't

46:53 want to feel like we're that, you know, like it's that quid pro quo. So I think that you really need

46:58 some kind of facility in the middle of that access consortium that is able to help businesses steer

47:02 and guide a lot of these maintenance, pretty basic kinds of maintenance things that need to happen

47:07 for projects that would make their lives easier. And that can then funnel a ton of money into a ton

47:13 of margin on that goes into the innovation work and all the forward looking kind of stuff.

47:16 And everyone's happy.

47:18 Yeah. Do you think the PSF could do it?

47:19 I think the PSF could do it. I think that the PSF would be, I don't know if it operates as a

47:24 nonprofit.

47:25 It does. Yeah.

47:26 Yeah. So if it's a nonprofit, I think it'd be very hard for it to do it. It might need to actually

47:31 create like sort of Mozilla Foundation, Mozilla Corporation. I think it would need to create

47:35 some kind of a traditional C corporate or a B Corp, perhaps like a social mission for profit that it

47:42 owns like director seats on and, you know, the chunk of the things. But companies, a lot of times are

47:47 just prohibited from writing checks to 501c3s unless it comes out of their philanthropy group.

47:52 So again, this is that making it easy to do business with kind of thing.

47:55 Yeah. Interesting.

47:56 Absolutely. I think the PSF should spin up a thing like that. And I've been sort of

48:00 quietly advocating for this behind the scenes a little bit. And maybe I'll be more vocal about

48:04 that here this year.

48:05 All right. Well, we can spread a little word on the podcast as we just have.

48:08 It's really interesting. And I think there's absolutely lots of possibilities for business

48:15 models in open source. But I feel like there's actually a 98% gap, like 2% of that is captured.

48:26 98% of it is not because we have these large, but still not huge, like banks in the Midwest that

48:33 contribute nothing. They do no PRs. They don't do anything to that effect, right? They just,

48:38 it's just not in their culture. And like you said, there's no real mechanism for them to

48:43 pay a little and get more and justify that.

48:45 Yes. Yes. And actually some of the open source business models that are emerging now,

48:49 they present challenges of their own. Again, my overriding thesis is that the world of software

48:54 is actually commoditizing pretty quickly. And so people, like if you look at the things that have

49:01 been happening in the last six months, as I would say open source software component vendors,

49:07 like Mongo and Redis and Timescale and others, as they start getting their business eaten by the cloud

49:13 vendors, they're realizing that open source, you know, sounded great. Open core sounded great.

49:18 And then they start losing any future route to revenue. And they've got to actually aggressively

49:23 go to like dual licensing and like deep viral HEPL three kind of stuff. I don't know that open source

49:29 is even the right conversation to have anymore. I think it should be around sustainable community

49:35 innovation and the freedom to experiment, freedom to innovate, freedom to, you know, there's a lot of

49:40 like free as in beer and free as in innovation. But like, the traditional ways we have about talking about the

49:47 source code itself, again, is limited in this paradigm of like code drops. And we're beyond that now.

49:53 Yeah. And you know, you look at the cloud, for example, a lot of these places that they provide you something,

49:58 and you pay on usage, right? You don't buy any software in the cloud, but you have the subscription

50:06 model all over the place, right? And that's, that's starting to really shift the way things are working

50:11 as well. And I feel like the cloud vendors actually have this interesting lock in where they're a little

50:16 bit defended against some of these challenges that are coming up.

50:20 Well, absolutely. There's only like three major cloud vendors of significance in here in the US,

50:25 at least. And all of them are absolutely going for lock in. And they're, you know, ultimately,

50:32 their business model. It's not necessarily I mean, it's a for profit business model, put it that way,

50:36 right? Yeah, the cloud is the new lock in with a lot of those API's. It's interesting. And like this

50:40 MongoDB AWS thing you talked about, like, that's a little bit of it as well, right? But it's pretty

50:45 interesting. Yeah, I think we could probably talk for hours and hours on this, because we're both

50:50 pretty passionate about it. It's awesome. But let me ask you a few more questions before we run out of

50:55 time. Sure. These are all sort of forward looking type things. And one of them is data science from

51:00 you called out the year 2012 to me that if you look at the analytics and the graphs and the usage,

51:05 like there's a huge increase in the derivative of a lot of things around Python at 2012, up till now.

51:13 So five years further out, what do you think data science looks like? Is it still deeply working

51:20 with Python? Is it solving different problems? Where is it going?

51:23 We're going to see data science much more integrated. People have a better sense of what it

51:30 can and can't do by itself rather, right? It's a new discipline that's coming into the business. It's a

51:36 new swim lane. Everyone's trying to figure out how they stand in relation to it. There's a lot of

51:40 political, you know, fighting and a lot of experimentation within a lot of businesses that I see. But at the end of

51:44 the day, I think this idea of doing data exploration, doing model development, and revving models that are

51:52 really critical to the business is the new reality for people. So that's not going away. That's a

51:57 fundamental dynamic that's going to be here. And if you need to go and explore data, you need to go and

52:01 do model development, then you're going to be doing data science full stop, right? There's no,

52:06 like, if you need to basically bring in domain expertise, stats, and coding ability to do that

52:12 well, then you're going to need data scientists intersect. You need all three of those skills,

52:16 you need all three of those. But data scientists are going to find themselves needing to have a much

52:21 better, I think the borders between data, the data science world and the others will clarify better.

52:26 So you'll have data scientists interacting with data engineers, and much better, hopefully much better

52:31 established best practices around how that's supposed to go. And then IT people start accepting that,

52:36 yes, Python is here to stay, we're going to need to deploy real Python stuff. And we need to know a

52:40 little more something about it, right? And so a lot of these little intersectional areas right now

52:45 between data science and other concerns, same thing with BI, people right now, there's literally people

52:49 out there selling point and click visualization tools saying that's data science. And it's like,

52:53 that's not really data science. But they're going to figure that out probably in the next couple

52:58 of years. Hopefully, they get the clue. Yeah, I think that's what I think is going to happen.

53:02 Now, the result of that happening is a gigantic, I think that that clue is going to really start

53:06 hitting home in two years or so. Then the immediate next problem that people have is overall workflow

53:13 management across all of these things. Because everyone's got their favorite tools. Everyone is

53:18 producing things that touch and intersect with everyone else's stuff. How do we get all of this

53:23 stuff managed in one place? And I think that's the challenge doesn't be fit, we're gonna be square in

53:28 the middle of that conversation still. And five years from now, assuming that the Chinese economy

53:50 assuming that the Chinese economy hasn't collapsed, we are going to see some really scary stuff coming

53:56 out of Chinese and the AI innovation happening there. Because they have been, they're completely

54:01 unapologetic about using their entire national population of a billion people as a sandbox for

54:07 trying AI surveillance, sort of cybernetic, the computer controls you kind of things.

54:12 Yeah, the whole social ranking, and all that stuff that's...

54:16 So here's the terrifying thing about that. I'm going to be a little bit of a contrarian on this.

54:20 What if it turns out that their sesame credit system, Rev2, no, Rev1 is scary and crappy.

54:24 Rev2, what if it turns out that they give social sesame credits for their businesses and local

54:29 politicians? Yeah.

54:30 What if they actually start upgrading social sesame credits to being this kind of thing where

54:34 it becomes almost like a, again, back to Warcraft, but like a Warcraft honor reputation system,

54:38 right? And becomes multicolored, it becomes vectorized instead of scalar. They might actually

54:44 innovate a scary, awesome approach that has deep problems because it requires a surveillance state.

54:50 And the Western world might look at that and say, huh, you know, that actually works a lot better

54:54 than, you know, Ivanka Trump, you know, running our fast food joints.

54:58 Yeah.

54:58 Sorry, the White House. So that dates this podcast, by the way. For those who are listening months in

55:03 the future, in case you forgot, just two days ago, the President of the United States served Big

55:07 Macs at the White House. That just, that happened. So this is still fresh in our minds.

55:12 To Clemson, who won the national college football championship. Yeah.

55:16 Yes. It's incredible. Anyway. So the point is that the scary thing about the Chinese AI system

55:21 is that it might work and work really, really well.

55:23 Yeah. Not that it's just pure wrong, but actually there's aspects of it that are amazing

55:28 in its sort of black mirror, electric dreams way.

55:32 Oh yeah. Tell you what, it's going to be pretty amazing. I think the same way that like a lot of the

55:36 Western world is like, oh, well, we already saw where this goes in Orwell, so we're not going to

55:41 go there. Western world has that kind of snottiness about it. I think they're underestimating how good

55:46 it could be and how tempting that goodness can look to technologists, to the capitalists, and to the

55:53 policymakers here. That's really for me as a, as someone fled the communist regime, you know, as a

55:58 child, like that's the scary thing about it.

56:00 That is really an interesting analysis. And certainly I was thinking ethics, data ethics,

56:05 and accountability for data models and AI and ML, right? Like, sorry, you couldn't get the house.

56:12 The AI said no, right? Like, no, no, no. You have to say why the AI said no. Well, we don't know,

56:17 but it's really good. And it said no, you know, like answering that problem is going to be interesting

56:22 too.

56:22 It is. And you know, the, the thing is that already now you get denied, right? And there's already a model

56:28 that tells you why you're denied. And the AI can, this kind of gets back to that same thing with the

56:33 whole black mirror thing and the AI in China, like really, really good AI. It doesn't look like that

56:37 AI, you know? So the really, really good systems, quote unquote, good, the really effective systems

56:43 at partitioning people and spot targeting them, they're going to be dressed up in ways that are

56:48 palatable. Our robot overlords will look like Cylons. They're going to look really human-like.

56:52 This is the scary future, man. I'm not trying to like scare you and scare your listeners.

56:56 I'm just telling you though, like, this is what's coming. And as humans, I'm actually a human. I'm

57:01 not a Cylon as humans, as, you know, tribe human, I think we've got to get better at being human.

57:06 And so that's maybe too philosophical hand wavy, but anyway.

57:10 Yeah. It's really an interesting thing to ponder for sure. All right. So I guess final comment or topic

57:17 just real quickly is I feel like there's been this Python 2, 3 debate, modern Python versus legacy Python,

57:25 as I like to position it. And I feel like the adoption of modern Python in data science is much faster

57:33 than it has been in the general Python space. One, do you think that's true? And then two, why do you think that is?

57:41 One, I think it's true. And two, I think it's because a lot of data science stuff is new and

57:45 legacy data science code tends to age with models. So like a piece of data science code is only as good

57:52 as the model data that it was trained on and models change because the world changes. So there's a built

57:59 in expiration date on any data science model that you've got. So you're not keeping transaction systems

58:04 from 20 years ago live.

58:06 The complexity and the algorithms and the techniques are just not even relevant, right? Like the machine

58:11 learning of five years ago doesn't compete with the machine learning of today. And it's not like

58:15 you're just going to upgrade. It's a totally different thing. You just retrain it on TensorFlow

58:20 or Keras or whatever, right?

58:21 Right. And secondly, this is another sort of important dynamic, which is that the regulatory environment

58:27 around data science hasn't caught up. So it doesn't require you, you know, I was talking to an engineer

58:32 from a software modeling engineer from an airplane company. And he was saying, yeah, the FAA requires

58:39 us to be able to reproduce our computational design models for like decades, for decades.

58:45 Yeah. Wow.

58:46 So, I mean, yeah, because planes actually, if they're well maintained, they fly for a long time,

58:50 right? And if there's a structural failure of a part...

58:53 Right. There's a lot of 737s out there. Yeah.

58:55 Oh, yeah. And so data science just doesn't have that problem yet. And, you know, one of the earliest

58:59 adopters of Python, this is a really interesting dynamic that people may not be aware of, but

59:03 in the mid 2000s, there was a significant uptake of Python in the hedge fund and the finance industry.

59:10 And so that was Python 2, Python 2, 5, 2, 6 around the time. And so that got into a lot of places.

59:18 And finance is actually a pretty regulated area. And so a lot of that code, especially if it starts

59:23 running production finance systems, people need to keep it running, not only because they're...

59:27 Even if you stop using a particular finance model to like score or to do whatever, to price a trade

59:33 and things like that, oftentimes you'll want to go back and do what's called backtesting.

59:37 So you want to run new data against those old models, and you'll want to race them against the

59:43 new models, right? You'll want to run new models on old data and new data on old models. And so that

59:48 kind of backtesting approach, you need to keep that old code running for that purpose as well,

59:52 just from a risk management perspective. So a lot of the finance industries like running

59:56 ahead and adopting Python 2 has sort of gotten them stuck on Python 2 a little bit.

01:00:01 Okay. Interesting. Yeah. So almost a victim of its own success in a way, but in some of these

01:00:09 industries. All right. I guess we're going to have to leave it there because we're out of time. But

01:00:13 like I said, a lot of interesting stuff to talk about. I have to just put it at rest. So before we move on,

01:00:19 though, I'm going to ask you the two questions, always ask it in the show. If you're going to

01:00:23 write some Python code, what editor would you use?

01:00:25 My old go-to is still Vim. But for large code bases, I tend to use PyCharm so I can, you know,

01:00:30 sort of navigate more easily.

01:00:31 Yeah, sure. Makes sense. And then there's many, many packages on PyPI or available on CondoForge.

01:00:39 What do you think one that people maybe haven't heard of, but they should, or you want to recommend?

01:00:43 Is it bad form to pimp? Is it like to pimp your own stuff?

01:00:46 No, you do it. No, no, go ahead.

01:00:48 So I'm really, really excited about a new project that we created called Intake,

01:00:53 which I would encourage people to take a look at it. It's pretty new. We just launched it last year.

01:00:58 Yeah, it looks interesting. I was going to ask you more about it, but we just

01:01:00 have too many topics already. So tell us about it real quick.

01:01:03 So Intake is a data loading abstraction library. So it's basically just load my data,

01:01:09 and it abstracts your data loading stuff into a declarative syntax so that the beginning of your

01:01:14 data science scripts doesn't have a whole bunch of like embedded and brittle SQL calls or pandas

01:01:19 column transformations or things like that. Intake is a way to make it so that your actual

01:01:23 data science or data transformation code is sort of its own code artifact and your data bits are your

01:01:29 data bits. It's kind of a nerdy thing, but we think that it actually addresses that data,

01:01:34 that model reproducibility and code reproducibility problem that data scientists face.

01:01:38 Sounds really useful. Thanks. All right. So final call to action. People are excited about

01:01:42 the Anaconda distribution or maybe getting, making some progress on this open source business model

01:01:48 thing we talked about. What would you say to people?

01:01:50 So I would say that we have AnacondaCon coming up. So if you're actually using Python

01:01:55 in a commercial environment, strongly recommend AnacondaCon. We have a, we try to make a really good

01:02:01 blend of technology and practitioner kind of stuff and workshops there combined with

01:02:07 business perspectives. So it's not like an industry conference like Gartner or Strata.

01:02:12 It's not like a pure one of those things. It's also not a pure like tech community conference,

01:02:16 like Pi data or something like that. So it's, we try to make a mix of those things.

01:02:19 We've gotten really good reviews in the past couple of years. It's our third year doing it.

01:02:23 I'm super excited about it. It's here in Austin in April, April 3rd to 5th.

01:02:26 So that's AnacondaCon.io. And secondly, people are using Anaconda to like it and they're using it in a

01:02:32 business environment. I would recommend they check out Anaconda Enterprise. We are very,

01:02:36 very proud of the product and we have a lot of problems that we solve for people inside business

01:02:40 environments and the business use of Python for deployment, package management.

01:02:44 Yeah. Real quickly, like what, what's the, what do you get from, right? You know,

01:02:47 I talked about the business model should be, you get a little bit more for your money,

01:02:50 not just pure charity, you know, here's a PayPal donate button. What do people get real quick?

01:02:57 So Anaconda Enterprise is, it gives you the ability to have your own managed package repository.

01:03:02 It gives you a way to do secured and governed collaborative notebooks and model deployment.

01:03:07 It works in the cloud. It works on prem. Many of our customers use it across an air gap and very

01:03:12 strictly governed environments. We basically make it so that data scientists and Python practitioners

01:03:18 in business can be as effective with Anaconda as they are at home nights and weekends on their

01:03:22 own laptops. All right. Yeah. That sounds cool. We just clear all the IT hurdles. Yeah,

01:03:25 that's sweet. All right. Well, thanks for all that you've talked about here, Peter. It's been a

01:03:30 super interesting conversation. Thanks for being on the show. Thank you so much for having me. I

01:03:33 really enjoyed it. You bet. Bye. Bye-bye. This has been another episode of Talk Python to Me.

01:03:38 Our guest on this episode was Peter Wang. It's been brought to you by Linode and Rollbar.

01:03:43 Linode is your go-to hosting for whatever you're building with Python. Get four months free at

01:03:49 talkpython.fm/Linode. That's L-I-N-O-D-E. Rollbar takes the pain out of errors. They give you the

01:03:57 context insight you need to quickly locate and fix errors that might have gone unnoticed until users

01:04:02 complain, of course. Track a ridiculous number of errors for free as Talk Python to Me listeners at

01:04:07 talkpython.fm/Rollbar. Want to level up your Python? If you're just getting started, try my Python

01:04:14 Jumpstart by Building 10 Apps course. Or if you're looking for something more advanced, check out our new

01:04:20 async course that digs into all the different types of async programming you can do in Python.

01:04:25 And of course, if you're interested in more than one of these, be sure to check out our everything

01:04:29 bundle. It's like a subscription that never expires. Be sure to subscribe to the show. Open your favorite

01:04:34 podcatcher and search for Python. We should be right at the top. You can also find the iTunes feed

01:04:39 at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on

01:04:45 talkpython.fm. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it.

01:04:51 Now get out there and write some Python code.

01:04:53 Bye.

01:04:53 Bye.

01:04:54 Bye.

01:04:54 Bye.

01:04:54 Bye.

01:04:54 Bye.

01:04:54 Bye bye.

Talk Python's Mastodon Michael Kennedy's Mastodon