Learn Python with Talk Python's Python courses

#198: Catching up with the Anaconda distribution Transcript

Recorded on Wednesday, Jan 16, 2019.

00:00 Michael Kennedy: It's time to catch up with the Anaconda crew and see what's new in the Anaconda distribution. This edition of Python was created to solve some of the stickier problems around deployment, especially in the data science space. Their usage gives them deep insight into how Python is being used in that enterprise space as well. And that turns out to be a very interesting part of the conversation. Join me and Peter Wang, CTO at Anaconda, Inc. on this episode of Talk Python To Me, Number 198, recorded January 16, 2019. Welcome to Talk Python To Me, a weekly podcast on Python. The language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @MKennedy. Keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via @talkpython. This episode is sponsored by Linode and Rollbar. Please check out what they are offering during their segments. It really helps support the show. Peter, welcome to Talk Python.

01:06 Peter Wang: Thank you very much. I'm very happy to be here.

01:08 Michael Kennedy: I'm happy to have you here. It's been awhile since we've talked about Anaconda. I had Travis Oliphant on the show way back when, but it seems like it's time for a catch-up on what you all have been up to.

01:17 Peter Wang: Yeah, well, there's been a lot going on. One of the employees has commented that every six months it feels like a different company. And we do, yeah, the space has evolved very quickly. We're trying to just keep up with it.

01:28 Michael Kennedy: So you would say this data science thing is not a fad? It's probably going to be around for awhile?

01:31 Peter Wang: At this point I think I'm going to go on a limb and say it's probably going to be around for a little while.

01:35 Michael Kennedy: Right on. Alright, before we get into all that though, let's start with your story. How did you get into programming and Python?

01:40 Peter Wang: I actually got into programming when I was a young kid, and I've actually been programming for almost as long as I've been speaking English. I got a PC when I first came here to the United States, so I was very lucky. But I actually majored in physics, and out of college I started going to computer programming as a profession. I did a bunch C++, but I discovered this thing called Python on Slashdot, and I think they announced the version 1.5.2. I was like fine, I'll go take a look at it. And I started playing with it, and I just fell in love. So in my day job was getting beat up by C++ templates, and out of compliance compilers, and at night I'd just tackled Python. So finally after a few years of this I ended up moving to Austin and I got a job doing Python as my day job, which was awesome. In 2004 I started Enthought, and I did a lot of work in the scientific community and doing consulting with Python, because I knew the science given my math and science background in physics. But I also knew the software principles and software engineering. So it was really a fantastic time. And that's basically the long and the short of it.

02:40 Michael Kennedy: Yeah, it sounds like a great fit. Things just came together, right. You have this math and science background, and you love Python. You found this job and it all just, all those things came together to really put you in the right place.

02:51 Peter Wang: They really did. I feel very, very blessed in that way. Now, it was a lot of hard work too. But I got very comfortable, and there's this great quote from Bruce Lee that you must never, like not you must never get comfortable, but there will be plateaus and you can't stay there. So I think towards the end of the 2000s, around 2010 I was starting to see big data happening, and I started realizing that Python was getting used for business data analysis more than just science and engineering, and that our little cozy sci-fi community could actually be something much bigger. So I started doing some exploration and exploratory work. I really wanted to do D3 for Python, feel the little things I wanted to scratch, a few other itches. So I started Continuum with Travis in order to address some of the technical gaps that we had in the community and the technology stack, and then also to really push a narrative in the technology market that yes, Python is good for business use. Yes, it's production ready. Yes, you should use it. And it can handle big data just fine. So we really started pushing that narrative in 2012, created NumFocus, created PyData, did all these things. And I think that the results have spoken for themselves.

03:58 Michael Kennedy: I definitely think they are, they have. That's great. In 2012 I do think there was a little bit more of a debate of well, is it safe to use Python for our business critical stuff. But I feel like that battle has been really solidly won, especially on the data science front, right? There was debates about R, maybe R was the space to be. That's not really where it's at anymore is it?

04:20 Peter Wang: No, there was definitely a period of language war sort of stuff going on early on. It's odd. Even then the discussion about is data science a fad, is it a fad term, isn't it just business intelligence, or is this just a big data hype cycle all over again? There's a lot of doubters and haters on that term. But as I've talked to more users and managers and stuff at businesses it's clear that they are thinking about data analysis and data analytics in a very different way than they have for decades, and data science is definitely here to stay because of that.

04:52 Michael Kennedy: Absolutely, absolutely. So maybe give people a sense of what you do day-to-day so they know where you're coming from.

04:58 Peter Wang: Well, my day-to-day consists of, my former role is CTO. I run the community innovation and open source group here at Anaconda. I actually don't run the project engineering teams. And I work with everyone, but my general role is working with community, helping the various community-oriented and open source devs that we have champion their projects and work better with a broader community. I also do a lot of industry-facing technical marketing and evangelism. So a lot of customers will have me go and speak at internal data science events they do, things like that. There's actually remarkably few people in the Python world that really speak to industry on behalf of Python itself relative to the usage of it. I mean, you'll find no shortage of industry analysts talking about how great Java is, or how great these big data projects are, all these PR type things. There's no one doing that for Python. So that is actually some of my day job. Beyond that it's just trying to keep up with all the things that are happening in data science, machine learning, data engineering, data visualization, AI, all of it.

05:58 Michael Kennedy: On top of the advocacy role, it's a pretty much full-time learning thing, because there's so much change, right?

06:04 Peter Wang: There's so much in every area. I mean, there's all the cloud stuff too. There's edge learning, there's data privacy. You name it. Every single area that touches data science is undergoing massive change right now.

06:14 Michael Kennedy: That's super exciting, but it's also a bit of a challenge. I think the Anaconda Distribution does help some with that. Before we get into the Distribution story though, let's just talk about Anaconda, Inc. When I had Travis on the show a couple years ago it was Continuum that was the company, and Anaconda was the distribution. But now those are not different anymore, right? It's just Anaconda, the company, and the distribution.

06:40 Peter Wang: We renamed ourselves really out of pragmatism, because we would go to places and we'd introduce ourselves as Continuum Analytics, and they're like oh yeah, you guys got some Python stuff. We see that here. Who are you guys? And then we'd say, oh we make Anaconda. And they're like, oh I love Anaconda. I use Anaconda all the time, blah, blah, blah. After that started happening to us all the time, we figured maybe we should just call ourselves Anaconda. And you know, one of the things that held that up was for a long time as we were growing the company and growing the Distribution we were afraid that changing the company name would actually spook the community. It's been one of these interesting things. I have lots to say about open source, let's just put it that way. But it's very hard to play the game of open source honestly, and not still get beat up with FUD about it. So even though we've open sourced our build tools, we've open sourced our the recipes, we open sourced everything from the very beginning, there are still people in the community who distrust us, because we're a company trying to build sustainable funding for this open source effort. So that's one of the reasons we actually were reticent to do that name change until finally it just became a no-brainer that we basically had to.

07:51 Michael Kennedy: Yeah, because people keep mistaking you for Anaconda, Inc. Maybe just say fine, that is our name.

07:56 Peter Wang: Yeah, and we'll just deal with the haters on a one-off basis I guess, I don't know.

08:00 Michael Kennedy: Yeah, exactly. I mean, it's not unprecedented, right? 37Signals, who made Basecamp and founded Ruby on Rails, they eventually renamed themselves just Basecamp. The one major project. Fine, we're called that. I guess it's like Microsoft renaming themselves Windows, which they're probably very happy they didn't. But in a lot of instances it makes sense. That's cool. Okay, there's a broad spectrum of folks who listen to the show. Many of them will have experience with data science. Many of them will know what the Anaconda Distribution is. But maybe just for the folks who are new, or have been working somewhere else, tell them what is this distribution? How is it different than the standard C Python? And why did you guys make it?

08:43 Peter Wang: I'll try to sum this up for a technical, but not data science necessarily audience. The basis gist of it is that Anaconda arose out of a failure in the Python ecosystem to address the packaging needs for the numerical and computationally heavyweight packages that are in Python. So for the same reason that Linux distributions exist, very few people build Linux from scratch. For actually exactly the same technical reasons we built the Anaconda Distribution, because it's actually really, really hard to correctly build all of the underlying components that you need for doing productive data science and machine learning. So the reason it's distribution is because all of the libraries you build and the packages, the modules with extension modules that you load up, they need to be compiled together. They need to be compiled in a compatible way. So you need to agree on compiler definitions. You need to agree on cogeneration targets, optimization levels, things like that. If you only ever use pure Python packages, so packages whose code only consists of py files, then you basically never run into a problem. It's only when you start having extension libraries, things that depend on maybe system libraries, God forbid you try to cross platforms between Linux and Mac and Windows, or cross architectures between ARM and x86. You're completely hosed. So we, in service to the scientific Python community, we built this distribution that was a set of packages and a way of building packages that are compatible with each other. So that's what the Anaconda Distribution is. It's a bulk distribution with about a couple hundred premade libraries and we have a package updater in it called Conda that let's you then install thousands more that are built by us and built by a large open community that also uses the same standards. So that's what Conda and Anaconda are in a nutshell. It's really one of these packaging war kind of things, or the confusion of Python packaging, we actually tried to approach Guido back in the day to help define some standards around this, and he basically gave us a very helpful guidance, which is maybe your packaging needs are so exotic you need to build your own system. So we took him at his word and we did it. And consequently when people use Conda in a lot of cases things just work. There's still corner cases and a lot of little rough spots, especially in terms of pip interop, but we're very proud of the work that we've done so far. It's used in production everyday by big, big companies that people rely on Python for their production workloads. So that's basically Anaconda and Conda in a nutshell.

11:14 Michael Kennedy: Okay, well that's a really good summary. When I think of it, the main value is that you get pre-compiled binary versions of the packages that would otherwise have to be compiled from source when you pip install them, right?

11:29 Peter Wang: Yes.

11:30 Michael Kennedy: And then the other part is the cross package compatibility because somebody makes one package, and they have an interest in making that as best they can or whatever, but they don't really care about integrating and testing against all the other open source projects that you may pull into your project that they don't even care or know about. So this bigger picture compatibility that you look at is pretty cool as well.

11:55 Peter Wang: It's actually become quite critical, and I think this is one of the areas that the Python community in the confounding haze of packaging and half-build packaging solutions that we've not really been good at giving guidance to the user community about is that if all you ever need to do is build one package for yourself, and you fully control the deployment environment and the development environment, they maybe you can go and do that. But if you actually have to work on a team with other people...

12:20 Michael Kennedy: For example, on web developers, a lot of times they control the server. They choose the packages they bring, and they write the code, and they can just push it out to their server and they're good, right?

12:30 Peter Wang: Yeah, and they're good to go. You can do any number of things that you want to. You know, what I would liken it to is if you ever build your own wheel, if you build your own native extensions, it's like getting plastic powder or plastic pellets and making your own mold of Legos or Lego-like things, and pouring your own little pieces. So as long as your the one that controls what they have to plug in to, and you're the one that controls all the molds, then you don't need any standard definitions of studs, or holes, or lengths, or anything like that. You're good to go. But if you ever want to work with other people who have their own molds and their own places and studs they want to put these things on, you've got to come up with a standard definition. So what Anaconda is, is essentially, it's like a Lego system. We've standardized what the studs are and what the holes are so lots of people can build different all kinds of Legos and they all can plug together. And that's kind of the long and short of it.

13:18 Michael Kennedy: Yeah, very interesting. Some other things that are in play there are you talked about Conda and installing the packages that you built, the couple hundred or whatever that come with the distribution. Then you also said installing the others through this thing called conda-forge. What's conda-forge?

13:36 Peter Wang: Well, conda-forge is a community of people who I would say out of a masochistic charity to the community, they take on a job of maintaining the build scripts and recipes that take upstream software and make it so it's actually buildable in a reproducible way, and that it works with other things. It's a community of package builders, and they have several hundred contributors, and they've built thousands of packages. We ourselves built about a thousand, although only 200 are built into the big Anaconda installer download. But the conda-forge community goes even beyond that and built several thousand. And that's what conda-forge is.

14:11 Michael Kennedy: Yeah, interesting. So people are like, it's really painful to build this package, but only one us should ever suffer and feel that once, and we'll do that on behalf of the community. I'll take that on for this one package.

14:23 Peter Wang: Yeah, basically. The real challenge is it's one of those things in life where it's almost worse that it's easy to do a bad job. I don't know that we have a term for this in English. Maybe there's a long German word for it. But it's like the same thing with cutting principles of if something is broken you want it to break loudly and fail loudly, right. You don't want it to make a half effort sometimes it kind of works, sometimes. But building packages is the same thing. Most people kind of get a build working for most things. But does it work well? Will they ever be able to do it again? Does it work with anything else? None of those things, it takes a lot of work to make a good package build.

15:01 Michael Kennedy: Well, that speaks to the reproducibility side of things. I know in data science and scientists using data science tools that reproducibility is a super important aspect. And I guess the first step is I can run the software, which means I can build the packages and install them.

15:17 Peter Wang: Right, and that is really what we think that providing prebuilt binaries, and then having good prominence of the build system itself, that's really some of the only ways you can really honestly not kidding yourself have reproducibility. I think some people think that Docker saves them, but it really doesn't. So it's kind of a struggle right now honestly, because there's so many moving pieces. There's a lot of confusion in that space. But I do, yes, I do agree with you that Conda packages used properly can absolutely be a great way to ensure reproducibility for data science.

15:49 Michael Kennedy: Yeah, well, it's probably better than saying well, if you want to install this package, you're going to need to have the Visual C 2008 compiler set up correctly on your machine.

15:59 Peter Wang: Oh my God.

16:00 Michael Kennedy: In 2025 or whatever when it's no longer compatible with the Windows or who knows what, right?

16:05 Peter Wang: Yeah, we're going to have to, one of the reasons I think that our team, the Conda and Anaconda team are happy to move away from Python 2 is because the dependency on that compiler. Some day it will finally put Python 2 to rest. I'm probably going to try to eBay a bunch of boxes of those CDs just so they can break them out of a cleansing bonfire or something, I don't know. Maybe you shouldn't burn CDs. That's bad, actually.

16:29 Michael Kennedy: Yeah, but you could have some sort of ceremony with them for sure. I think the new Python 3.7, it uses MSBuild, is that right?

16:38 Peter Wang: You know, I'm not sure all the details of that. But I think that there have been significant improvements and the Python folks who work at Microsoft have worked really hard to improve the compiler situation there for Python. I think it's much better now with Python 3, and in the later releases of Windows. It's just we have very old Python, very old Windows that still are deployed that we have to keep those users going. So that's where almost all the pain is.

17:04 Michael Kennedy: I can imagine. I just had Steve Dower from Microsoft on the show, and he's in charge of installers up there. He's doing some really cool stuff to make it more accessible on Windows. It's easy to go to conferences and forget how important Windows actually is. You look around, it looks like everyone has a Mac, there's a few people running Linux. That's pretty much what you see at the conferences, but that's not what the actual consumption out in the world is, is it?

17:30 Peter Wang: No, that's not at all reflective of even the United States and then you go to the broader world. It's a lot of Windows. It's a lot of Windows. A lot of Linux too. But yeah, I think this is one of the structural problems that faces the open source community is that when you're small it's easy to do product management, because it's you and your buddies. But once you get bigger you have to actually intentionally go and try to pull in information from your users. That's actually, I think, a structural challenge for the Python community at this point in time.

18:00 Michael Kennedy: When we're talking about conda-forge and things like that something I had not heard of before, but I saw that you were running, is something called Bioconda. Now, it sounds like it might have to do with biology in Conda and data sciene and biology, but that's all I can discern from it. Tell us about that. That's new to me.

18:17 Peter Wang: Bioconda is actually not one of our projects. Oh, I should have said this earlier with conda-forge. Bioconda, conda-forge, and various other sort of groups, they use our Anaconda cloud package hosting infrastructure to support their community, because with the Conda package installer it's easy to give it a name space flag, basically a channel name, and then it will go and download packages only from that channel on Anaconda Cloud. So these represent, conda-forge and Bioconda represent different communities that are using the Conda packaging tool. But they may have set slightly different standards or included certain other standards in their build system, protocols, and standards so all these packages work together. So yes, Bioconda is for the biology, genomics community. They have very specialized, well specialized may be a euphemism, but there's a lot of specialized software needs in the biology community. It's very R centric. There's a lot, depending on what you're doing in that domain, there's a lot of Perl sometimes.

19:14 Michael Kennedy: Interesting. We'll leave that there. Are there other ones? Is there like a Chemconda or things like that?

19:21 Peter Wang: No, there's actually, I think Bio, I'm going to kick myself later probably as I forget some, but there are major research disciplines and communities that do you Conda quite a bit. So I think the astronomy research community has taken on Python in. They use Conda as a way to get nightly builds and dev builds and just really get easy deployments of their complex software. One of the things that Conda does well, I should have said this earlier, it's not just a Python packaging tool. It's a user land software packaging pool. So we package up R, Perl, Python, C, C++, Fortran, Java, Scala, Ruby, Node, you name it. We really are almost like a portable user land RPM kind of thing. So that allows for these communities that have a lot of scientific and engineering code written in not Python, sometimes not even in C or C++, we can package all those things up together and move these collections of packages around.

20:14 Michael Kennedy: Yeah, that's pretty interesting. That takes the challenge of packaging and magnifies it extremely, multiplies it combinatorially.

20:23 Peter Wang: Oh yeah. Oh yeah. It definitely gets pretty complex.

20:28 Michael Kennedy: This portion of Talk Python To Me is brought to you by Linode. Are you looking for hosting that's fast, simple, and incredibly affordable? Well, look past that bookstore and check out Linode at talkpython.fm/linode. That's L-I-N--O-D-E. Plans start at just $5 a month for a dedicated server with a GIG of RAM. They have 10 centers across the globe, so no matter where you are or where your users are, there's a data center for you. Whether you want to run a Python web app, host a private Git server, or just a file server, you'll get native SSDs on all the machines, a newly upgraded 200 gigabit network, 24/7 friendly support even on holidays, and a seven day money back guarantee. Need a little help with your infrastructure? They even offer professional services to help you with architecture, migrations, and more. Do you want a dedicated server for free for the next four months? Just visit talkpython.fm/linode. So another thing that looks like it's doing really well is Anaconda Cloud. This is a place where data scientists can share their work and their packages and things like that? Is that right?

21:32 Peter Wang: Yes. Right now Anaconda Cloud is primarily, I think, used as patch hosting environment. And a lot of developers in the data science ecosystem use it as a way to publish nightlies or dev builds. Many of the key projects they give us a heads-up when they're about to cut a new release so that they can make sure that they can announce the Conda package at the same time they announce the release of the cutting new version of the software. So it's very nice of them.

21:55 Michael Kennedy: Yeah, so how's that work alongside, as well as moving differently than just putting on PyPI?

22:01 Peter Wang: It gets pretty complex. Number one, there's channel support. So we basically have individual developers can have their own channel, and those packages, their users can just download packages from just that channel, and not a single global name space. Another really important thing is that there's not just one build. So Conda as a packaging system has much deeper and richer metadata about the build environment and what it expects of the runtime environment. So I can build a package that the same upstream software, I can build different versions that are optimized for different levels of your hardware, like whether or not you have GPUs, whether or not you have an advanced Intel chip or a relatively basic chip. I can push all that stuff in, and maybe you're using this version of a compiler or that version of a compiler, like Clang versus GNU, GCC. These things actually make material difference in whether or not the package will work. That level of resolution and that ability to feature, flag, and select is not available on PyPI, as far as I'm aware. Again, it's just even if one package is available, if you use pip to install PyPI, pip aggressively goes and tries to build other things from source, right? And if it doesn't, it doesn't do an apriori solve what you need. It grabs things as they go. So you can end with very much the incorrect packages coming down. You can end up trying to build something from source that maybe it'll build successfully, but again, that's not what you wanted. You wanted the pre-built, right?

23:24 Michael Kennedy: Right, with different settings, different compiler.

23:26 Peter Wang: That's the primary difference.

23:27 Michael Kennedy: It is frustrating periodically that you can say here's a bunch of things I need to pip install these things. And one of them will have a requirement that the version of one part is no larger than such and such, and yet it'll go grab, depending on the order once you specify it, it may grab the wrong one and just install that. Then the other package is incompatible. There's weird little cases like that you can get into.

23:55 Peter Wang: All the time. Because it's actually, this is one of those areas of software development that for most people it's not a fun and sexy area to think about. But it's a deeply critical thing when we rely on open source software is to actually understand what does the dependency matrix look like? And there's no free lunch. If you do it in this relatively naive way, like what pip does, then you can easily end up in a corner and things are incompatible. If you try to do what we do, which is have very explicit and curated metadata of versions, and you do an apriori solve, people complain the solve takes a long time, which it can. So there's really no free lunch on that. I think one of the challenges that we actually have is that the metadata itself can be wrong, and we found that all over the place. So packages think, they will declare they're compatible with this version or that version, and they're actually not. So we have to actually patch what the upstream declarations are. Again, it gets a little detailed. There's just a lot of muck in this area that we have to deal with.

23:55 Michael Kennedy: Yeah, it sounds a little bit like these are the problems that you can address, and then learn about if your job is to coordinate a whole bunch of packages that don't interact intentionally with each other. They just want to make their project something that you can ship and install and use. And that's fine. But at this interaction across them is where it gets tricky.

23:55 Peter Wang: There's absolutely a tragedy of the commons. The metaphor I've used in the past is that every developer, open source maintainers, bless their hearts, they are doing a thankless job a lot of times anyway. And they're way burned out and stressed. But they're really solving for does my vehicle work in my driveway? Can it get out of my driveway and drive into my other maintainer's driveway down the street? If that works, they're good to go a lot of times. And when every of the thousand developers in the ecosystem do this you end up with a bunch of cars squashing all over each other in the highways and the freeways, because they're not thinking about that integration problem for their end users. And the end user is a lot of times in data science, they're not sophisticated software developers. They have no ability to solve this problem for themselves.

23:55 Michael Kennedy: They're at the very edge of struggling to write a 10-line script, not understand the complexity of Tensorflow's dependencies or something like that.

23:55 Peter Wang: Exactly.

23:55 Michael Kennedy: One thing that you all did recently that seems to be a trend is you switched from the major/minor versioning scheme to calendar-based scheme. I think this is an interesting thing, especially around open source, because Mahmoud Hashemi created this site called ZeroVer to sort of make fun of all the projects that have been around for 10, 15 years with 50 or a hundred releases, but are 0.1.17, really small versions. And it seems like one of the fixes is to say let's move towards something that has more to do with I can look at the version and I can tell you without deeply knowing that software whether that's a new version, an old version, a medium aged version. If I told you requests was 2.1.4, is that new, is that out of date? I don't know. But if you use this new style it's pretty obvious. What was the thinking there?

23:55 Peter Wang: It's a community convention, and it's for that user affordance that you can look at it and know. And also we set this expectation that we will release at a regular cadence and it's for our own internal documentation and everything else. Everyone is able to collaborate more easily around that. But I think the ZeroVer thing, I mean, I love Mahmoud, and I think it's a hilarious thing, in a community here where we have SciPy or Jupiter and other things Pandas zero dot whatever, I guess it's not quite zero dot anymore, but like SciPy for sure, there is actually something. We can laugh at it all we want to, but there's a thing there the author is trying to say, or the maintainer is trying to say, which is it's not quite ready yet. I'll call it 1.0 when I'm good and ready, and I'm not ready yet, and might not be for 20 years. Of course, that's also kind of a silly position to take with literally millions of people and their production code depend on your software.

23:55 Michael Kennedy: I think they're not saying that it's ready. I think what they're thinking of to say when it goes to 1.0 a lot of times is it's done. And software is rarely done.

23:55 Peter Wang: Well, software is done the instance it's released. At least that version of it, right?

23:55 Michael Kennedy: That's true.

23:55 Peter Wang: I think this is where we as an industry actually have to uplevel our thinking about this. We got to stop thinking about software as artifacts, tarballs of code that are static. We have to start thinking about this from a flow perspective. That we are looking at flows of projects and there's a covenant that is established in the relationship between the user of one of these flows, and the people who originate those flows. I think there's a really interesting thing I learned years ago about aerodynamics. Basically that when planes move less than the speed of sound aerodynamics somewhat similar to water flow. But once you break the sound barrier, the thing that causes you the greatest amount of pressure on your airframe and things like that, you have to reason about the change in cross-sectional area of the airplane as it moves through the air. So it's almost more like streams of thick rope, and you're shoving rope aside. So you move this particle flow way to looking at actual flows. Similarly with software I think we got to stop thinking about this as being just a code drop, and maintainers as people who go and dump out a bunch of code, and look at relationship with projects. This gets to sustainability, this gets to versioning and what is the promise and a version number, all of that stuff. It's deeply involved. I don't know that the software industry has really started to learn how to consume. The enterprise consumers of open source, I don't know that their internal practices ever really caught up with thinking about it that way.

23:55 Michael Kennedy: Yeah, and that's kind of why I was bringing up the versioning a little more deeply, because I think the folks that spend their time all day in open source, they know that Flask, even though it had some small version number, recently moved to 1.0. But it had some small version number, but it's really used a lot and it's been around a lot. So it's fine. But the corporate groups, the enterprise groups, they see that as a flag of that's test software. We're not ready to make our bank run on test software. Is that the feeling that you got by interacting, because you touch both open source and enterprise groups more than a lot of folks I would suspect.

23:55 Peter Wang: Yes, absolutely. We are a B2B software company. That's where the bulk of our revenue comes from. And absolutely we suffered. We suffered mightily for that. We have to go and basically talk to procurement and compliance and IT people that are swimming, they're up to their ears in software, they look at a spreadsheet, we come in with our enterprise software and say, here's the open source things that are in the manifest. And they look at this thing and they're like what is this? This is a pile of garbage. It's all zero dot whatever. And it's like yeah, but that runs Instagram. That literally runs Dropbox. So what are you complaining about? You don't really want get into argument. Once you have that argument with an IT guy you've already lost.

23:55 Michael Kennedy: Right, you're a small insurance company with a hundred thousand customers. You're not running YouTube with a million requests per second that's using similar software. But it's the mentality right?

23:55 Peter Wang: Yeah, and a lot of going into any kind of, I would say that over the last five or six years, I've had to do a lot of adulting. One of the parts of adulting up from just being a geek code nerd kind of guy to being able to actually have these kind of customer conversations is having empathy for the customer. From their perspective, yeah, they are just a regional bank with a few hundred thousand customers. They don't have the budget of Alphabet to throw at a SRE team and a whole dev team and all that stuff. So their approach is to understanding risk and risk mitigation from the thousands of vendors that want to sell them software, maybe it's the most practical. Again, I'm not defending it. I'm just saying one can come to a point of empathy with their approach.

23:55 Michael Kennedy: That's a really good point. I do totally agree. It is exactly because they're small they can't hire the fresh new hottest software engineers that would rather be in Silicon Valley or Austin or Portland or wherever. They just don't even have the ability to determine whether or not what you're saying is true in a lot of cases. They'd just rather use Microsoft. We know that they give us this SLA and this agreement, and we're just good. There's one way to make websites. Use ASP.NET. We're good. Just use something else supported like that. And it's a challenge that they obviously want to use these new tools, especially in data science. But they've got a different culture and way of describing software being ready.

23:55 Peter Wang: We can laugh all we want to about these clients guys beating us us for our SciPy zero dot whatever, but on the flip side how many of our credit card reports and our gas bills come from basically some little ASP app or some Access database God forbid with a bunch of VBA macros? That runs the world. So how elite are we really?

23:55 Michael Kennedy: That's an interesting point. Yeah, it's definitely worth thinking about. In a broader sense though, I feel like Python is making its way into this enterprise and major corporation space. I know it's increasingly being used for a lot of work, not just data science, but other types of software as well. How do you see it? How do you see the world with your insight you got?

23:55 Peter Wang: Well, I think that's absolutely right. And I think that the Python community may not survive that adoption.

23:55 Michael Kennedy: Interesting, what do you mean by that?

23:55 Peter Wang: Not Python the language, but the Python community. What I mean by that is that I've talked to quite a few maintainers of some popular projects and they've all reflected to me that in the last couple of years as Python adoption has shot through the roof. I think some of it is our push is on data science, things like that. Others are this rapid rise of deep learning. Many things have contributed to this. But ultimately Python is one of the most popular languages on the planet. People are getting jobs in Python. And they're using Python to do their jobs. What we're seeing is this transition and the expectation of hey man, this is just my nine to five. This is a tool that I'm supposed to use to do my job, and this tool sucks right now. So I'm going to get on your GitHub and I'm going to give you a bunch of grief about it, because this is your freaking tool. My employer, I got to feed my family, and my employer tells me I got to use this tool. It's a piece of crap. So that is, like I said, the Python community might not survive that adoption transition unless it intentionally really works hard to drive some values into the newcomers.

23:55 Michael Kennedy: So maybe that person that comes and complains because I used to download my stuff from microsoft.com. Now I get it from python.org. But this thing sucks so I'm going to go back and just complain about it as if there's a commercial entity on the other side whose job it is to make the SLA legit.

23:55 Peter Wang: But more likely, more likely, they picked up, they inherited some piece of crap three year old Python code from some guy who didn't know what he was doing.

23:55 Michael Kennedy: Written in Python 2.5 or something.

23:55 Peter Wang: Oh absolutely. It'll be 2.5. I think there's a couple of 2.4 things running around that I'm aware of. But a lot of 2.5. There's a lot of 2.5 out there. And yeah, it's using some old version of matplotlib or something or some old version of Pandas. And they're going to complain on the tracker, or on the issue tracker about that.

23:55 Michael Kennedy: Part of the cultural change that I think we should try to encourage sounds like you're doing this for your job. It's not so great. We are the maintainers. But you have a company who depends upon this. Can your company contribute some time, a PR? It's got to be a two-way street, I think. It can't just be well, one of the things that I suspect that you also feel at Anaconda, Inc. is there are so many companies out there making millions and billions of dollars a year on top of free. There's like people working in their free time on some open source project that company is basically built upon and they make billions of dollars and contribute back nearly zero or zero.

23:55 Peter Wang: Yes, I've frequently quipped that I can fit probably the core NumPy, Pandas maintainers, we've gotten a few more now. So they don't all fit in my minivan, but at one point in time certainly core NumPy...

23:55 Michael Kennedy: You're going to need one of those longer full vans that holds 15 people.

23:55 Peter Wang: I may need a 15-person van, but I could probably fit them in the 15-person van. You know, Matplotlib, which everybody relies on, is just a few people, maybe part-time. There's not one whole FTE on it even. There's projects like Jupiter that are very large, but also underfunded, and there's projects that are small underfunded. Yes, it's exceptionally tragic. It's exceptionally tragic.

23:55 Michael Kennedy: I think with part of the tragedy to me is if it really took a thousand people to make matplotlib, 600 people to make Flask, maybe the community can't contribute back enough to pay those thousand engineers full-time. But like you said, it's like a van full of people or it's my small car full of people for Flask and Click, and all those things. The people in companies that use Flask make so much money and depend so heavily upon it, they could easily pay those three, four, five people to be full-time on that and be doing really well, but they don't. It's not even asking very much of them, which is what's crazy.

23:55 Peter Wang: I'm of two minds on this, not two minds, but I have two major views on this. One of them is that we should look at this as the triumph of software. I mean, just a sort of restate the point you were making, which is that holy crap, one or two or 10 people can build something that is fundamental to billions and billions of dollars of global economic activity. That's something to be celebrated, because that should free up, think about how many more software developers don't have to be working on Flask. They can just go and have free time. Not really, but you know, in theory that's how...

23:55 Michael Kennedy: Well, build something more interesting than just the framework. They can build something with this result.

23:55 Peter Wang: So that's one way to look at it, in that we should celebrate where we can. But on the other hand the thing is if we can't even somehow come up with the funding for like 10 FTEs for these fundamental projects, what's broken? What's broken? It can't be that hard. I think there's two ways to look at this. One is that the open source community as essentially the field of software I think is essentially commoditizing out, and what open source represents, and this particular thing happening in the Python ecosystem is the very vanguard of this transition, it represents essentially the end of labor economics for software. So that going away, we're at that transition, so it's very hard to think about it for companies, because companies will allocate budget for software development in a very headcount oriented way. They know what they're getting when they pay for an FTE dev here or there or wherever. If they just throw money at some open source, what are they getting for it? They know how to pay money for software. Companies are very good at paying money for software. But paying for stuff that they can already get for free, that is a null value on a spreadsheet that cannot compute that. It is NAN. So my view on this is actually quite simple, which is that if open source developers, the people like me who care about the open source ecosystem, if we want to sustain the community innovation and that positive abundance mentality that we have in the open source ecology, the human ecology of open source has moved to post-scarcity post-labor economics. If we want to sustain that then we need to drive a new conversation. We need to provide the tooling and the infrastructure for the companies to think about how to consume this.

23:55 Michael Kennedy: This portion of Talk Python To Me is brought to you by Rollbar. Got a question for you. Have you been outsourcing your bug discovery to your users? Have you been making them send you bug reports? You know, there's two problems with that. You can't discover all the bugs this way. And some users don't bother reporting bugs at all. They just leave. Sometimes forever. The best software teams practice proactive error monitoring. They detect all the errors in their production apps and services in real-time and debug important errors in minutes or hours, sometimes before users even notice. Teams from companies like Twillio, Instacart, and CircleCI use Rollbar to do this. With Rollbar you get a real-time feed of all the errors, so you know exactly what's broken in production, and Rollbar automatically collects all the relevant data and metadata you need to debug the errors so you don't have to sift through logs. If you aren't using Rollbar yet, they have a special offer for you, and it's really awesome. Sign up and install Rollbar at talkpython.fm/rollbar and Rollbar will send you a $100 gift card to use at the Open Collective, where you can donate to any of the 900-plus projects listed under the Open Source Collective, or to the Women Who Code Organization. Get notified of errors in real-time and make a difference in open source. Visit talkpython.fm/rollbar today. What are some of the key elements?

23:55 Peter Wang: One way to do it is you can look at it almost like treat each new, number one is something we have to work on ourselves, which is to not making money a bad word, which is still a mindset that pervades many open source communities and developers. And any affiliation with any kind of money managing money changing organization is seen as essentially...

23:55 Michael Kennedy: As seen as corrupting sometimes.

23:55 Peter Wang: It's corrupting, exactly. So we literally have a SciPy mailing list a couple years ago. Someone was arguing that we should only allow steering council members to be part of universities or part of academia, which they don't have their own agendas, and the other people were like are you kidding me? Academics don't have agendas anymore? So people like to kid themselves a lot about this kind of stuff. But anyway, so I think that the open source community needs to number one, not be allergic to money, and treat it a corrupting influence.

23:55 Michael Kennedy: There's companies and ways, business models that are trying to help open source, and trying to be good participants in it, and then there are the corrupting evil taking advantage of type companies. It's not black and white. But there are certainly paths forward where companies like you guys and others are putting in lots of effort to try to make things better legitimately.

23:55 Peter Wang: Yeah, and I appreciate that you recognize that. We really have really tried to be good citizens in the open source community. But I think for a lot of companies that it's like the mind is willing, but the spreadsheets are weak. It's still really hard for people and proponents and advocates, even within those companies, to at the end of the day make the budgetary justifications, because the companies internally don't know how to reason about it. So I think that's where the open source community can try to help. Number one, one thing we could do is do almost like Kickstarter style, or I play Warcraft a little bit, so it's like World Boss take down. So before we can release any new versions of library XYZ next year, we've got to get this much money in. People basically just, they put the money in. But I think that's, as fun as that would be, and the Kickstarter model like that, as cool as that would be and interesting as that would be, I think businesses have a hard time just writing checks for donations. So the other thing that I think the open source community needs to do, I think the one that's more realistic, is to actually form entities that can have a business conversation with the corporate players, and understand how to talk to their procurement, talk to their legal and everyone else, and act as a crossover facility to do the project management so the businesses know what they're getting for their money. It's not a charity. Some things that people may not be aware of that for a business to write a $10,000 charity check that comes out of a different part of the business a lot of times. Even if everyone wants to, for budgetary and for finance and compliance reasons, they literally cannot just write a check to some dude, to some open source hacker in the middle of Europe somewhere. So these are the things that we need to put together.

23:55 Michael Kennedy: I think the allergic to money issue, I think that that can be solved with the right examples of open source companies and companies entering open source in positive ways. But I feel like there's some kind of structure, or something that has to get between the corporations and the open source projects where it's like you say, it's not a charity check. It's you pay into this and you get a little bit more of something and I don't know what that is. But there's something like that. Then the companies can justify it. They say, look, we depend upon this thing. We pay 0.01% of our revenue to the people that make it work so that our system doesn't go away, and here's what we get for 0.01%. I don't know what that is.

23:55 Peter Wang: We don't have to reinvent the wheel here. It happens all the time in every other industry. It's an industry consortium. It's an industry consortium. You pay into it. And what happens is you get votes on various technical councils and technical boards, and they do the product management and the debt management for what the thing should be. In the Python world we want that to, in all cases for a lot of these projects, we want that to still be subordinate to the vision of the open innovation volunteer kind of crew. But there's so much housekeeping. There's so much issue tracking stuff. There's so much documentation, management, clean up, just keeping the lights on and the act shaving. There's so much that goes into a project that these kinds of consortium models can fund. I think Python itself, and I'll just come out on your podcast and I'll just say it. I think Python itself badly needs this. Badly needs an actual consortium like this to be operated in a way that can accept dollars easily, that's easy for people to write checks. We all know these entrepreneur's. Make yourself easy to do business with. The open source community I would say has not made itself easy to do business with. You got to either hire a core dev, and if you do that core dev has to, in their own minds, be like am I wearing my community hat or my employee hat, which is tough on them, very stressful for them. And the open source community, even when we get the dollars, we don't make it clear to the people writing the checks what those dollars are buying for them. If they have a couple issues that are easy to solve, that really can make a difference for them, we don't necessarily prioritize those issues just because they wrote us a check, because we don't want to feel like we're that quid pro quo. So I think that you really need some kind of facility in the middle that acts as a consortium, that is able to help businesses steer and guide a lot of these maintenance pretty basic kinds of maintenance things that need to happen for projects that would make their lives easier. And that can then funnel a ton of money into a ton of margin on that goes into the innovation work and all the forward looking kind of stuff. And everyone's happy.

23:55 Michael Kennedy: Do you think that PSF could do it?

23:55 Peter Wang: I think that PSF could do it. I think that the PSF would be, I don't know if it operates as a non-profit.

23:55 Michael Kennedy: It does, yeah.

23:55 Peter Wang: Yeah, so if it's a non-profit I think it would be very hard for it to do it. It might need to create like Mozilla Foundation or Mozilla Corporation. I think it would need to create some kind of traditional C corp or B corp perhaps, like a social mission for-profit that it owns director seats on and the chunk of the things. But companies, a lot of times, are just prohibited from writing checks to 501c3 unless it comes out of their philanthropy group. So again, this is that making it easy to do business with kind of thing.

23:55 Michael Kennedy: Yeah, interesting.

23:55 Peter Wang: Absolutely. I think that PSF should spin up a thing like that. I've been sort of quietly advocating for this behind the scenes a little bit and maybe I'll be more vocal about that here this year.

23:55 Michael Kennedy: Alright, well we can spread a little word on the podcast as we chat. It's really interesting and I think there's absolutely lots of possibilities for business models in open source. But I feel like there's actually a 98% gap, like 2% of that is captured. 98% of it is not because we have these large, but still not huge, banks in the Midwest that contribute nothing. They do no PR's. They don't do anything to that effect. It's just not in their culture. And like you said, there's no real mechanism for them to pay a little and get more and justify that.

23:55 Peter Wang: Yes, yes. And some of the open source business models that are emerging now, they present challenges of their own. Again, my overriding thesis is that the world of software is actually commoditizing pretty quickly. If you look at the things that have been happening in the last six months as I would say open source software component vendors, like Mongo and Redis and Timescale and others, as they start getting their businesses eaten by the cloud vendors, they're realizing that open source sounded great, open core sounded great, and then they start losing any future route to revenue. And they've got to aggressively go to dual licensing and AGPL3 kind of stuff. I don't know that open source is even the right conversation to have anymore. I think it should be around sustainable community innovation and the freedom to experiment, freedom to innovate. There's a lot of free as in beer, and free as in innovation, but the traditional ways we have about talking about the source code itself, again, is limited to this paradigm of code drops. And we're beyond that now.

23:55 Michael Kennedy: Yeah, and you look at the cloud, for example, a lot of these places that they provide you something, and you pay on usage. You don't buy any software in the cloud, but you have this subscription model all over the place, and that's starting to really shift the way things are working as well. I feel like the cloud vendors have this interesting lockin where they're a little bit defended against some of these challenges that are coming up.

23:55 Peter Wang: Absolutely. There's only like three major cloud vendors of significance, here in the U.S. at least. And all of them are absolutely going for lockin, and ultimately their business model, it's a for-profit business model, put it that way.

23:55 Michael Kennedy: Yeah, the cloud is the new lockin with a lot of those APIs. It's interesting. And like this MongoDB AWS thing you talked about, that's a little bit of it as well, right? It's pretty interesting. I think we could probably talk for hours and hours on this because we're both pretty passionate about it. It's awesome. But let me ask you a few more questions before we run out of time.

23:55 Peter Wang: Sure.

23:55 Michael Kennedy: These are all sort of forward looking type things. One of them is data science from, you called out the year 2012, to me that, if you look at the analytics and the graphs and the usage, there's a huge increase in the derivative of a lot of things around Python at 2012 up to now. So five years further out what do you think data science looks like? Is it still deeply working with Python? Is it solving different problems? Where is it going?

23:55 Peter Wang: We're going to see data science much more integrated. People have a better sense of what it can and can't do by itself rather. It's a new discipline that's coming into the business. It's a new swim lane. Everyone's trying to figure out how they stand in relation to it. There's a lot of political fighting and experimentation with a lot of businesses that I see. But at the end of the day, I think this idea of doing data exploration, doing model development, and revving models that are really critical to the business is the new reality for people. So that's not going away. That's a fundamental dynamic that's going to be here. And if you need to go and explore data, you need to go and do model development, then you're going to be doing data science full stop. If you need bring in domain expertise, stats, and coding ability to do that well, then you're going to need data scientists.

23:55 Michael Kennedy: Intersect, you would need all three of those skills?

23:55 Peter Wang: You need all three of those. But data scientists are going to find themselves needing to have a much better, I think the borders between the data science world and the others will clarify it better. So you'll have data scientists interacting with data engineers, and hopefully much better established best practices on how that's supposed to go. And then IT people start accepting that yes, Python is here to stay. We're going to need to deploy real Python stuff, and we need to know a little more something about it. So, a lot of these little intersectional areas right now between data science and other concerns, same thing with BI. Right now there's literally people out there selling point and click visualization tools saying that's data science. And it's like that's not really data science. But they're going to figure that out probably in the next couple of years. Hopefully they get the clue. I think that's what's going to happen. Now, the result of that happening is a gigantic, I think that clue is really start hitting home in two years or so. Then the immediate next problem that people have is overall workflow management across all of these things. Because everyone's got their favorite tools. Everyone is producing things that touch and intersect with everyone else's stuff. How do we get all of this stuff managed in one place? I think that's the challenge. We're going to be square in the thick of it. Also, additionally five years from now we are going to be probably not cleared through our challenges on data privacy. So how we actually manage data in a responsible way, and how we actually think about what we're doing with inference engines and prediction engines in an ethical way around data, we're going to be square in the middle of that conversation still. And five years from now, assuming the Chinese economy hasn't collapsed, we are going to see some really scary stuff coming out of Chinese in the AI innovation happening there, because they are completely unapologetic about using their entire national population of a billion people as a sandbox for trying AI surveillance. Sort of cybernetic, the computer controls you, kind of things.

23:55 Michael Kennedy: The whole social ranking and all that stuff that's...

23:55 Peter Wang: So here's the terrifying thing about that. I'm going to be a little bit of a contrarian on this. What if it turns out that their Sesame Credit System, Rev 2, no Rev 1 is scary and crappy. Rev 2, what if it turns out that they give social Sesame Credits for their businesses and local politicians? What if they start upgrading social Sesame Credits to be in this kind of thing where it becomes almost like, again, back to Warcraft, but like a Warcraft honor reputation system? And becomes multi-colored? It becomes vectorized instead of scaler? They might actually innovate a scary awesome approach that has deep problems because it requires a surveillance state and the Western world might look at that and say huh, that actually works a lot better than Ivanka Trump running our fast food joints. Sorry to the White House. So that dates this podcast by the way. For those who are listening months in the future, in case you forgot, just two days ago the President of the United States served Big Mac's at the White House. That happened. So this is still fresh in our minds.

23:55 Michael Kennedy: To Clemson, who won the National College Football Championship, yeah.

23:55 Peter Wang: Yes, it's incredible. Anyway, so the point is that the scary thing about the Chinese AI system is that it might work and work really, really well.

23:55 Michael Kennedy: Yeah, not that it's just pure wrong, but actually there's aspects of it that are amazing in it's sort of Black Mirror Electric Dreams way.

23:55 Peter Wang: Oh yeah. Tell you what, it's going to be pretty amazing. I think the same way that a lot of the Western world is like oh, well, we already saw where this goes in Orwell so we're not going to go there. The Western world has that kind of snottiness about it. I think they're underestimating how good it could be, and how tempting that goodness can look to technologists, to the capitalists, and to the policy makers here. That's really, for me, as someone who fled the communist regime as a child, that's the scary thing about it.

23:55 Michael Kennedy: That is really an interesting analysis. And certainly I was thinking ethics, data ethics and accountability for data models and AI and ML. Sorry, you couldn't get the house. The AI said no. No, no, you have to say why the AI said no. Well, we don't know, but it's really good, and it said no. Answering that problem is going to be interesting too.

23:55 Peter Wang: It is and the thing is that already now you get denied. And there's already a model that tells you why you're denied. And the AI can, this gets back to that same thing with the whole Black Mirror thing and the AI in China. Really, really good AI doesn't look like that AI. So the really, really "good" systems, the really effective systems are partitioning people, and spot targeting them, they're going to be dressed up in ways that are palatable. Our robot overlords will look like Cylons. They're going to look really human-like. This is the scary future, man. I'm not trying to scare you and scare your listeners. I'm just telling you though this is what's coming. And as humans, I'm actually a human, I'm not a Cylon, as humans, as tribe human, I think we've got to get better at being human. And so that's maybe too philosophical and heavy, but anyway.

23:55 Michael Kennedy: Yeah, it's really interesting thing to ponder for sure. Alright, so I guess final comment or topic just real quickly is I feel like there's been this Python 2/3 debate, modern Python versus legacy Python, as I like to position it. I feel like the adoption of modern Python in data science is much faster than it has been in the general Python space. One, do you think that's true? And then two, why do you think that is?

23:55 Peter Wang: One, I think it's true. And two, I think it's because a lot of data science stuff is new and legacy data science code tends to age with models. So a piece of data science code is only as good as the model data that it was trained on. Models change because the world changes. So there's a built in expiration date on any data science model that you've got. So you're not keeping transaction systems 20 years ago live.

23:55 Michael Kennedy: The complexity and the algorithms and the techniques are just not even relevant, right? The machine learning of five years ago doesn't compete with the machine learning of today. And it's not like you're just going to upgrade. It's a totally different thing. You just retrain it on TensorFlow or Keras or whatever.

23:55 Peter Wang: Right, and secondly this is another important dynamic, which is that the regulatory environment around data science hasn't caught up. So it doesn't require you, you know I was talking to an engineer from a software modeling engineer from an airplane company, and he was saying yeah, the FAA requires us to be able to reproduce our computational design models for like decades. For decades. So I mean, yeah planes actually, if they're well maintained, they fly for a long time. And if there's a structural failure of a part...

23:55 Michael Kennedy: Right, there's a lot of 737s out there, yeah.

23:55 Peter Wang: Oh yeah, and so data science just doesn't have that problem yet. And you know, one of the early adopters of Python, this is a really interesting dynamic that people may not be aware of, but in the mid-2000s there was a significant uptake of Python in the hedge fund and the finance industry. So that was Python 2. Python 2.5/6 around the time. So that got into a lot of places, and finance, it's actually a pretty regulated area. A lot of that code, especially if it starts running production finance systems, people need to keep it running, not only because even if you stop using a particular finance model to score or do whatever to trade and things like that, oftentimes you'll want to go back and do what's called back testing. So you want to run new data against those old models and you'll want to race them against the new models. You'll want to run new models on old data, and new data on old models. So that kind of back testing approach, you need to keep that old code running for that purpose as well just from a risk management perspective. A lot of the finance industries running ahead and adopting Python 2 has sort of gotten them stuck on Python 2 a little bit.

01:00:01 Michael Kennedy: Okay, interesting, yeah. It's almost a victim of its own success in a way, but in some of these industries. Alright, I guess we're going to have to leave it there, because we're out of time. But like I said, a lot of interesting stuff to talk about. Have to just put it to rest. So before we move on though, I'm going to ask you the two questions I always ask at the end of the show. And if you're going to right some Python code, what editor would you use?

01:00:25 Peter Wang: My old go-to is still Vim. But for large code bases I tend to use PyCharm so I can navigate more easily.

01:00:31 Michael Kennedy: Yeah, sure. Makes sense. And then there are many, many packages on PyPI, or available on conda-forge. What do you think one that people maybe haven't heard of but they should? Or even recommend?

01:00:43 Peter Wang: Is it bad form to pimp your own stuff?

01:00:46 Michael Kennedy: No, you can do it. Go ahead, go ahead.

01:00:49 Peter Wang: I'm really, really excited about a new project that we created called Intake, which I would encourage people to take a look at it. It's pretty new. We just launched it last year.

01:00:58 Michael Kennedy: Yeah, it's looks interesting. I was going to ask you more about it, but we just have too many topics already. So tell us about it real quick.

01:01:03 Peter Wang: Intake is a data loading abstraction library. It's basically just load my data and it abstracts your data loading stuff into a declarative syntax so that the beginning of your data science scripts doesn't have a whole bunch of embedded and brittle SQL calls or Pandas column transformations or things like that. Intake is a way to make it so that your actual data science or data transformation code is its own code artifact, and your data bits are your data bits. It's kind of a nerdy thing. But we think that it actually addresses that model reproducibility and code reproducibility that data scientists face.

01:01:38 Michael Kennedy: It sounds really useful. Thanks. Alright, final call to action. People are excited about Anaconda Distribution or maybe making some progress on this open source business model thing we talked about. What would you say to people?

01:01:50 Peter Wang: I would say that we have AnancondaCON coming up. So if you're using Python in a commercial environment, strongly recommend AnacondaCON. We try to make a really good blend of technology and practitioner kind of stuff and workshops there combined with business perspectives. So it's not like an industry conference like Gartner or Strata. It's not like a pure one of those things. It's also not a pure tech community conference like PyData or something like that. We try to make a mix of those things. We've gotten really good reviews in the past couple of years. It's our third year doing it. I'm super excited about it. It's here in Austin in April, April 3rd to 5th. So that's AnacondaCON.io. Secondly, people are using Anaconda, they like it, and they're using in a business environment, I would recommend they check out Anaconda Enterprise. We are very, very proud of the product and we have a lot of problems that we solve for people inside business environments and the business use of Python for deployment, package management.

01:02:44 Michael Kennedy: Real quickly, what do you get from? I talked about the business model should be you get a little bit more for your money, not just pure charity, here's a PayPal donate button. What do people get real quick?

01:02:57 Peter Wang: Anaconda Enterprise gives you the ability to have your own managed package repository. It gives you a way to do secured and governed collaborative notebooks and model deployment. It works in the cloud. It works on prem. Many of our customers use it across an air gap in very strictly governed environments. We basically make it so that data scientists and Python practitioners in business can be as effective with Anaconda as they are at home nights and weekends on their own laptops.

01:03:23 Michael Kennedy: Alright, that sounds cool.

01:03:24 Peter Wang: We just clear all the IT hurdles.

01:03:25 Michael Kennedy: Yeah, that's sweet. Alright, well thanks for all that you've talked about here, Peter. It's been a super interesting conversation. Thanks for being on the show.

01:03:32 Peter Wang: Thank you so much for having me. I really enjoyed it.

01:03:34 Michael Kennedy: Yeah, you bet, bye.

01:03:35 Peter Wang: Bye-bye.

01:03:36 Michael Kennedy: This has been another episode of Talk Python To Me. Our guest on this episode was Peter Wang. It's been brought to you by Linode and Rollbar. Linode is your go-to hosting for whatever you're building with Python. Get four months free at talkpython.fm/linode. That's L-I-N-O-D-E. Rollbar takes the pain out of errors. They give you the context and insight you need to quickly locate and fix errors that might have gone unnoticed. Until users complain, of course. Track a ridiculous number of errors for free as Talk Python To Me listeners at talkpython.fm/rollbar. Want to level up your Python? If you're just getting started try my Python Jumpstart by Building 10 Apps course. Or if you're looking for something more advanced, check out our new Async course that digs into all the different types of async programming you can do in Python. And of course, if you're interested in more than one of these be sure to check out our Everything Bundle. It's like a subscription that never expires. Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the Direct RSS feed at /rss on talkpython.fm. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.

Back to show page