Learn Python with Talk Python's 270 hours of courses

#94: Guarenteed packages via Conda and Conda-Forge Transcript

Recorded on Tuesday, Dec 6, 2016.

00:00 Have you ever had trouble installing a package you wanted to use in your Python app?

00:03 Likely it contained some odd dependency, required a compilation step, maybe even using an uncommon compiler like Fortran?

00:10 Did you try this on Windows?

00:12 How many times have you seen cannotfindvcvarsall.bat before you have to take a walk?

00:17 If this sounds familiar, you might want to check out the Conda Package Manager,

00:21 Anaconda, the distribution, Conda Forge, and Conda Build.

00:25 Together, these dramatically lower the bar for installing packages across all the platforms.

00:30 This week, you'll meet Phil Ellison, Kale Franz, and Michael Sarahan, who all work on various parts of this ecosystem.

00:37 This is Talk Python to Me, episode 94, recorded December 14, 2016.

00:43 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

01:14 This is your host, Michael Kennedy.

01:16 Follow me on Twitter, where I'm @mkennedy.

01:18 Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via at Talk Python.

01:25 This episode is brought to you by MongoDB and Continuum Analytics.

01:29 Thank them both for supporting the show by checking out what they have to offer during their segments.

01:34 Kale, Michael, Phil, welcome to Talk Python, guys.

01:38 It's great to have you here.

01:39 Thanks, Michael.

01:40 Thank you.

01:40 Great to be here.

01:41 I'm looking forward to talking about Conda.

01:43 It's going to be really fun.

01:44 There's a whole ecosystem growing up around it, and I think it's a little bit underappreciated just from the people I talk to,

01:51 and so I really want to spread the word about how cool Conda is and what you can do with it.

01:56 So we'll get to all that, but let's start with your guys' story.

01:59 How have you gotten into programming?

02:00 Since there's three of you, let's kind of keep it a little bit short, but just, Kale, maybe how do you get into programming in Python?

02:05 30 seconds.

02:06 Yeah.

02:07 In grad school, I was writing a lot of MATLAB code, and then MATLAB is proprietary,

02:13 and this thing called NumPy came along and started using NumPy, and it kind of grew from there.

02:18 Oh, that's really cool.

02:19 Now you're at the very center of the nucleus of NumPy and all that kind of stuff.

02:25 That's great.

02:26 Michael, how about you?

02:27 Yeah, I started out with a web program to show my photos to my parents and family,

02:33 and then grew into science stuff when I went to grad school, much like Kale.

02:37 So, yeah.

02:38 Nice.

02:39 What did you study in grad school?

02:40 Electron microscopy.

02:41 Electron microscopy.

02:43 Okay, wow.

02:43 Very cool.

02:44 Lots of very small things, huh?

02:46 Yeah.

02:47 Was that like image recognition and things like this?

02:50 Yeah, and also spectral processing, so all kinds of fun signal processing and unsupervised learning.

02:56 Oh, okay.

02:57 Very cool.

02:57 Phil, how about you?

02:59 So, I started pretty young.

03:00 I started with Visual Basic when I was about 14, and then at university, I did quite a bit of Maple,

03:08 which is kind of a mathematical language pretty similar to SimPy, and then picked up Python when I was in my gap year in France.

03:16 I just loved the open source community, loved the scientific Python space, and went from there.

03:22 Yeah, that's excellent.

03:23 So, you were kind of bumming around Europe, and most people just hang out and chill,

03:27 and you picked up Python.

03:28 Very cool.

03:29 Yeah.

03:30 Awesome.

03:31 Okay, so let's get into Conda first, and then we can get into Conda Build, Conda Forge,

03:38 and the various things around it.

03:39 So, Kale, tell us, what is Conda?

03:42 Conda's a package manager.

03:43 Install software and use software.

03:47 It does a lot more than just manage packages.

03:49 It manages environments.

03:52 So, there's Conda inv, or there's an environment manager and the concept of environments, much like Python virtual inv, except it's system level, and it's Python agnostic, the environment part.

04:04 Oh, it's language agnostic.

04:07 So, you can have environments for, I mean, if you want to build Ruby with it, or whatever types of environments you want, completely supported.

04:14 So, yeah, it's package manager.

04:16 Very nice.

04:17 So, it's kind of like pip in some of its role, but it does more, right?

04:21 In some ways it does more.

04:22 In some ways it does less.

04:24 So, Conda and the Conda ecosystem is not the canonical source of truth for Python packages.

04:30 That's PyPI, and we don't install packages directly from PyPI.

04:34 That's PIP's job.

04:36 We don't pretend to.

04:38 But we're much more like, in some ways, more like a system package manager.

04:41 So, more like apt-get or yum or something like that.

04:45 A lot of times with a system manager you'll install Python or you'll install some of your core dependencies.

04:50 The harder things that pip will invoke a compiler for and things will crash or things like that.

04:57 I have wheels help now, but pip still can't install Python.

05:00 So, things that are a little bit lower level is where Conda really excels at.

05:06 And then, so, yeah, that's the tradeoff in capabilities, I guess.

05:10 Sure.

05:10 And let's see if I got the origins correctly.

05:14 I'll tell you what I think and then you tell me whether I'm right or wrong.

05:16 So, I feel like Conda came from sort of two places.

05:20 It feels like it came from you guys wanted to make it easier for data scientists and people, more correctly,

05:25 people working with the data science and scientific tools like NumPy and so on to actually start working with them.

05:31 Because they often had really freaky builds.

05:34 So, like they might depend on some sort of Fortran thing.

05:36 And so, in order to pip install, you have to have like a Fortran compiler.

05:40 A Fortran compiler.

05:40 Yeah, yeah.

05:41 Which, I mean, we're already battling with where is vcvars.bat again on Windows?

05:46 And now you're talking about Fortran compilers?

05:48 Come on.

05:48 So, is that sort of where it came from?

05:51 And it had to manage all these different things like Fortran, for example?

05:54 Yeah, that was the genesis.

05:55 And Conda came about about the same time as wheels came about.

06:00 So, there's a little bit of two different approaches.

06:03 But they've sort of diverged into really pip and wheels and things being Python specific.

06:10 And these source of truth for Python if there's not a more downstream distributor.

06:16 Whereas Conda, we do take care of all the hard parts.

06:19 So, NumPy and SciPy and all those dependencies.

06:23 And then Python and all the runtimes and stuff like that.

06:26 Yeah.

06:27 And one of the big differences is that you guys pre-build the binary pieces that we might need, right?

06:35 Yeah, that's correct.

06:36 So, a lot of times people ship to production and absolutely do not want compilers on their production systems.

06:42 And so, we do ship binaries.

06:44 Yeah, I find that to be really excellent.

06:46 Because, you know, like if I go and get SQLAlchemy or I think PyMongo even, some of these things that have C speedups at various places, often for like deserialization and serialization steps.

06:57 And if I don't have the right setup, you know, maybe it's not going to work.

07:01 Especially if I'm on Windows, for some reason that gets a little harder to do and gets neglected often.

07:06 And so, it makes it like doubly hard, right?

07:08 Sure.

07:09 And those things you don't have to worry about if I Conda install it, right?

07:12 Because it's already been compiled somewhere else by you guys.

07:14 Not only do you not have to worry about that, but you can Conda install Postgres or all of your other general dependencies.

07:21 It's not just Python.

07:22 We have a huge R ecosystem.

07:24 There's plenty of other ecosystems that people are, language interpreter ecosystems that people are building up around.

07:30 Okay.

07:31 And I guess another point about Conda versus a system package manager like YAM or apt-get is for your application code, you rarely want to be running the system Python.

07:41 You always want to have an application-specific Python.

07:44 And the OS package managers don't make that as easy.

07:48 You basically have to make your own RPMs or something like that.

07:50 Conda allows these isolated environments.

07:53 Yeah, it absolutely does.

07:54 That's very cool.

07:55 Very cool.

07:56 Okay.

07:56 So, how does Conda relate to Anaconda?

08:00 Sure.

08:01 Anaconda is a distribution.

08:03 There's an Anaconda distribution that is a set of packages that we ship about every quarter.

08:08 That is the easy way for scientists and engineers, data scientists, PhD students all over the world, to install most of the core Python packages that they'll need.

08:20 To do that, and the package manager part then is Conda.

08:23 And so you get Conda, and then when you need to install other packages from any of our repositories or anaconda.org or anything like that,

08:31 Conda is the command to extend what you install with the Anaconda distribution.

08:37 Okay, so Anaconda is like all that stuff pre-packaged, pre-compiled, like the major tools you want.

08:42 And Conda is the package manager for it.

08:45 Nice.

08:45 Phil, how about Conda Forge?

08:47 Where does Conda Forge relate to all this?

08:50 So, if you see Anaconda as a nice bundled set of packages to install Python and all of its dependencies,

08:58 then Conda Forge is essentially doing the same thing, except Anaconda is managed by Continuum, and Conda Forge is a community effort.

09:07 to kind of collaboratively to package this stuff.

09:10 There are so many packages out there, right?

09:12 I mean, just like you guys said, Conda is more than just Python.

09:15 But just in Python, there's over 90,000 packages on PyPI, which is crazy, right?

09:20 So, you probably can't manage all of that yourself, right?

09:25 No.

09:26 And so Conda Forge is to help crowdsource that, open source that a little bit.

09:29 Yeah, exactly.

09:29 It's not reasonable for us to expect Continuum to package all of this stuff.

09:34 It just simply doesn't scale in that way.

09:36 We needed a community effort to package some of the things that Continuum aren't necessarily ever going to package.

09:43 At the end of the day, they're a business, and they're there to supply their customers.

09:48 And that doesn't necessarily cover everybody who has various bespoke packages that need to be shipped.

09:55 Yeah, absolutely.

09:55 There's probably a very quickly small niche tale that sort of develops, right?

10:01 There's probably 500 or 1,000 packages that are super important, and then it becomes really niche quickly after that, right?

10:08 Absolutely.

10:08 That's where Conda Forge started, really.

10:10 I'm the author of several niche packages, which are extremely powerful for the people that it's targeting.

10:18 But by no means would you expect that to be packaged by a company who are there, ultimately, to sell a product to their customers.

10:27 That product may be freely available, but we don't have, as a community, we don't have control of what's packaged and how it's packaged.

10:35 Sure.

10:36 Yeah, that makes a lot of sense.

10:37 So I guess I didn't really ask, what is Conda Forge?

10:40 Maybe give people a quick definition so they know what we're talking about.

10:44 Okay.

10:45 So Conda Forge, in the same way that Anaconda is a bundle of pre-compiled Conda packages, Conda Forge is exactly the same thing.

10:55 It's a channel that you can enable using the Conda package manager to install all of your favorite tools.

11:02 And if your tools aren't available on that channel, then it's a community effort, and you're welcome to contribute your package to be bundled into the Conda Forge community and make it widely available to Conda users.

11:18 Right.

11:19 Okay.

11:19 Yeah, we'll definitely dig into that more and how you can do that in a little bit, but that's great.

11:24 And so in order to deliver things in a binary form, basically compiled for Windows, compiled for Linux, compiled for macOS, and things like that, you guys probably have some pretty interesting build infrastructure.

11:40 Michael, do you want to talk about that?

11:41 Oh, sure.

11:42 Yeah.

11:42 So we have a lot of different build machines.

11:45 And so one of the main advances that Conda Forge has done is to take advantage of many of the free continuous integration services that are on the web.

11:56 And so Phil figured out really, really clever ways to get around the different limits that those CIs present.

12:03 So he thought, oh, well, you know, it's not going to work to build an entire repository of recipes.

12:09 So can I instead break up the one repository into one recipe per repository?

12:16 And that was a really clever workaround that made it possible to use those CIs as infrastructure.

12:22 So that's part of it.

12:25 At Continua, we just have kind of a standard build system where we try to maintain compatibility with older frameworks because we support customers with old CentOS 5 and RHEL 5 systems.

12:38 But other than that, it's the same kind of idea.

12:41 Okay.

12:41 And you build it for, I can see how you would set up CI on Windows pretty easily.

12:46 I see how you set up on Linux really easily.

12:47 What about Mac?

12:49 So Mac is...

12:51 It's got to be a story behind that laugh, right?

12:54 Well, it's a little bit more awkward.

12:57 And the main reason why it's awkward is just Apple licensing.

12:59 They don't let you run virtual machines on anything but Apple hardware.

13:03 And so the number of people who are offering particularly free services with Apple builds is much more limited.

13:11 And that said, you can get it on both CircleCI and TravisCI.

13:14 And so we use TravisCI for Condo Forges Mac builds.

13:18 Okay.

13:18 Yeah.

13:19 I mean, there are a few places.

13:20 Like there's a place called Mac Mini Colo where you can get a bunch of Mac minis and stick them in some data center and they'll like run them for you.

13:28 But yeah, it definitely makes it harder.

13:29 But you guys found some kind of cloud service that you can use.

13:32 You don't have like a closet full of Macs?

13:34 At Continuum, we do have a closet full of Macs.

13:37 Okay.

13:37 Great.

13:38 But I really think this binary distribution thing is really excellent because certainly when you're getting started, it's frustrating to deal with the single compile, the single install.

13:50 So just knowing that it's always going to work is great.

13:54 But also the speed, right?

13:55 You can install stuff much quicker because you don't have to wait on a build.

13:58 Yes, certainly.

13:59 So if you're a Linux nerd, then it's kind of Gintu versus Ubuntu, right?

14:05 Yeah.

14:05 Yeah, yeah.

14:06 Very nice.

14:06 So you have a lot of different places where code is running.

14:10 We talked about the three major OSs.

14:12 Do you have compatibility for others?

14:15 You talked about older systems, for example.

14:17 How do you keep that all straight?

14:19 Well, that's a really hard thing, actually.

14:22 So the way that things are compatible varies by platform.

14:26 And so for Mac, we kind of have to target an older platform, and then it's forward compatible.

14:31 Windows is a weird story because on Windows, the Python version is pretty strongly tied to a particular version of Visual C++.

14:40 And so that's limiting in terms of what kind of code people can actually compile.

14:46 What version do you know?

14:47 Sure.

14:48 Yeah.

14:48 So Python 2.7 requires Visual Studio 2008.

14:52 And that's just kind of a custom because that's what the upstream Python.org build is.

14:57 And so in order to maintain binary compatibility, it's a good idea for everybody to keep that matchup.

15:04 But as a result of that, people can't really build C++11 software for Python 2.7 without mixing these runtimes.

15:13 And when you mix runtimes, it's just kind of it might work.

15:17 It might not.

15:18 You're kind of asking for instability.

15:20 Yeah.

15:20 No kidding.

15:21 It is too bad because C++ has actually had a bit of a renaissance, right?

15:24 Absolutely.

15:25 And 2008 misses.

15:26 That renaissance, that's too old for it.

15:28 Okay.

15:29 Interesting.

15:30 Interesting.

15:30 Can you, any of you guys want to jump in on this?

15:33 Talk about some of the challenges for community packaging.

15:36 Like, if I'm getting binary code delivered to me, how do I trust it?

15:40 Is that really different than trusting the source coming out of PyPI?

15:44 Sure.

15:44 I'll chime in a little bit.

15:46 I would say it's not any different from getting it from PyPI, except that if you get it from PyPI, you can at least inspect the source.

15:54 Whereas if you've got a binary blob, you have to have a different level of trust.

15:58 I think there's one extra level to that because PyPI, it is the original package maintainer that's uploading the package to PyPI, right?

16:07 We're taking from PyPI and we're downstream from that.

16:10 So there is an extra level of people in between.

16:14 On top of that, anybody can upload anything, any package they want to, to Anaconda.org.

16:19 It's up to the user to decide whether they want to use it or not.

16:22 And so there are some challenges there.

16:25 There's also a question of reproducibility in the sense that if you're getting the source, you can compile it anywhere, right?

16:31 But if you're getting a binary and you want to go and change your machine, then you're kind of, you're missing out on the ability to rerun that compile step where you choose.

16:41 Yeah.

16:41 So if you get a new machine, you're always going to have to go back to install via Conda.

16:46 You can't just copy the files over and regenerate it or something.

16:50 If you're on different architectures, for sure, you need to go and refetch the appropriate artifact for your hardware.

16:56 Yeah.

16:56 Okay.

16:57 That said, do you guys do, you know, I had Travis Oliphant on one of the earlier shows and I feel like he said that you guys did some verification that at least the stuff that ships with Anaconda,

17:12 mixes well together that one package that depends on another, you know, that those are compatible versions.

17:18 Is that right?

17:19 Right.

17:19 We do a lot of that.

17:20 And so we have our own kind of automated scripts that test our whole system together.

17:26 And then Conda Forge sort of does that in a more distributed way to say when they build a package, they run the test suite.

17:33 We also do that.

17:34 We run a few other consistency checks.

17:36 But those are the kinds of things that let us make sure that everything is playing well together.

17:41 I'd add there that the Anaconda distribution is extensively, we release it about quarterly and it's extensively tested manually.

17:48 We have a whole QA team that goes through that and make sure that each package works with, or each of the packages, however many that make up the full distribution work together.

17:58 So yeah, there's a lot of extra testing that goes into the distribution.

18:01 Yeah, I think that's really great.

18:03 I think that's one of the big values, not just that you don't have to be able to compile stuff to install it, but that it's taken as a whole somewhat.

18:10 You can trust it rather than here's a hundred little pieces.

18:13 Individually, they're probably fine.

18:15 How are they as a put together, right?

18:17 Yeah, the downside of that is when you install the Anaconda district, Conda install Anaconda, you're explicitly pinning all of your versions or all of your packages to a specific version.

18:27 And then when you want to update everything, people get confused about, I was looking for the most recent version of Jupyter and I said update all.

18:35 Why didn't I get the most recent version of Jupyter?

18:37 So there's a little bit of a downside.

18:39 That is a bit of a downside, I guess.

18:41 I did notice that.

18:42 One of the things I was playing around with was trying to use Conda for my web apps.

18:47 I noticed some of the web frameworks were farther out of date with what I got out of there than I guess I'd like.

18:54 Can you talk about that?

18:56 Whoever has the best info, I guess, on how do we like as things evolve?

19:01 You know, so suppose I've got some package put into Conda Forge.

19:05 How does the versioning on PyPI map into Conda Forge?

19:10 Yeah, I'll try to shoot that.

19:12 It's a manual process to update the recipe that creates the Conda package.

19:17 And so someone has to update the recipe on the staged recipes or on the feedstock on Conda Forge.

19:25 And then it's built out automatically by the CIs, which is great.

19:30 The CIs save you a lot of effort.

19:32 At Continuum, we don't have our CIs quite working yet.

19:35 And so it's really a much more manual process to edit the recipe and then build out each of the different packages.

19:42 With that level of effort, it's just a matter of first noticing that there is a new version available and then doing the work to package it up.

19:50 Sure.

19:51 In the defense of Anaconda versus Conda Forge here, there is clearly a huge overlap between what Anaconda does and what Conda Forge does.

19:59 From someone who's not part of the Anaconda team, I just want to say how much of a great job the Anaconda team actually do in providing a really coherent, well put together set of packages that you can trust, you can rely on.

20:14 They're stable.

20:15 And really, I guess that's the big selling point of Anaconda.

20:20 And Conda Forge really isn't trying to be in that space at all.

20:24 What it wants to be is the kind of the leaner, faster moving community effort.

20:29 But there is no chance that Conda Forge can be as well curated.

20:33 It can't provide the indemnity that you get from Anaconda.

20:37 As a distribution, Anaconda is just an amazing resource to have in the community.

20:41 Yeah, and it ships quarterly, so it's probably, you know, if you're willing to wait an extra month, you'll be on the new version of a lot of things anyway.

20:49 If I'm an author of a popular open source package that's either in Anaconda or Conda Forge, what can I do to make sure that the latest possible version is there, given that I have to accept the release cycles and stuff?

21:03 But what do I do?

21:04 Is there a place I talk to you guys, or what's the story?

21:07 We both have places to submit pull requests to recipes.

21:11 And so for the internal, for Anaconda, it's a recipe or a repository called Anaconda Recipes.

21:19 And then for Conda Forge, it's going to be the feedstock for whatever you want to improve.

21:25 If you want to add it, that's a staged recipes PR.

21:28 Absolutely.

21:29 Okay, so let's maybe talk about the recipes and feedstocks and stuff a little bit.

21:33 I have one more comment that I might have given the wrong impression about the distribution shipping quarterly.

21:39 The distribution is like one installer, one big meta package that has all kinds of stuff in it.

21:45 We're constantly updating.

21:47 I mean, Conda Forge too, but our source, our default repositories, we're updating all the time.

21:52 And you don't have to wait for the quarter to come around to get more recent packages most of the time.

21:58 Oh, yeah, yeah.

21:58 Okay.

21:58 It's just the Anaconda distribution and the big Anaconda meta package is updated once a quarter.

22:05 Those are the packages that get all the extra QA and making sure that they all put together and stuff.

22:10 I see what you mean.

22:10 So like the DMG I got from my Mac, that thing is updated quarterly.

22:14 But you're right.

22:15 When I run a little green circle thing, the sort of overall environmental manager, that thing, I can go and update stuff as they come.

22:25 I remember doing that quite more frequently.

22:27 That's cool.

22:28 Let me take just a moment and tell you about a sponsor of this episode.

22:33 Anaconda Con 2017 is the first conference for open data science leaders around the world

22:38 and the definitive gathering place for the Anaconda crew.

22:41 Whether you are a new or longstanding member of the community focused on business or technology,

22:46 Anaconda Con will help you conquer your biggest data science challenges.

22:50 Data science is a team sport.

22:51 That's why they're offering you two tickets for the price of one to Anaconda Con 17 starting now until January 16th.

22:59 Register today at talkpython.fm/acon to take advantage of the spectacular savings.

23:04 You'll get two tickets for the price of one, and you'll have the lovely tech-oriented city of Austin, Texas to share with your friend and all the top data scientists.

23:12 Yeah.

23:14 So let's talk about these recipes just a little bit.

23:17 So let's focus on Condo Forge for a minute.

23:19 If you go there, you can go to github.com/condo-forge, I think.

23:24 And you've got, yeah, you've got a couple of repos.

23:28 One of them is feedstocks, and that's where the accepted, those are the accepted sources that you guys pull from, right, to build packages?

23:37 Right.

23:38 We have a repo which holds, using Git submodules, every single individual feedstock repo that lives in Condo Forge.

23:47 So the feedstock is the place where the recipe is stored canonically, and that has continuous integration enabled.

23:57 So whenever we make changes to that recipe, it goes away, and it builds it each time and makes that artifact available on the Condo Forge channel.

24:03 Okay.

24:04 Maybe tell us what is a recipe.

24:06 You've got a couple of pieces that make that up, right?

24:09 Sure.

24:09 Mike, do you want to talk about recipes?

24:11 Yeah.

24:12 A recipe is a YAML file, for one thing, and it's just kind of a way of expressing a standard of how you tell Condo Build or any other program that interprets that YAML file.

24:27 Where you go to build the source, what are your build options, what are the instructions that you carry out to actually do the source,

24:34 and then one thing that we're adding in is how do you package the source after you've built it,

24:38 because some of the time you want to package different parts of what you've built into different packages.

24:45 I see.

24:45 That makes sense.

24:46 So if basically we go to GitHub and there's a link to a submodule that gets pulled in for each one of these,

24:53 if it's somewhere else other than Git, like Subversion, is it still possible to stick it in there,

25:00 or does it have to be copied over to a GitHub repo or something?

25:04 Condo Forge is based around the GitHub stuff, but there's no intrinsic limit to Condo Build as to where a recipe has to come from.

25:12 Right.

25:12 Okay.

25:13 Well, let's talk quickly about Condo Build.

25:15 Michael, you're also in charge of that, right, or working closely with that?

25:19 Yes.

25:19 So what's the story there?

25:21 Like, if I want to take a package and build it with Condo Build, I've got to put together the recipe for it to work.

25:28 I've got to obviously have the sources.

25:30 And then I get it set up and I give it to you guys and you ship it off to CI on various platforms to build out the various versions or what happens?

25:38 Yeah.

25:38 Condo Build is really just a lot of orchestration of environment setup.

25:44 And so building up a build environment is pretty hard.

25:48 There's a lot of different pieces that go into that.

25:50 And so in previous jobs as a software developer, setting up your workstation takes a day or two.

25:57 So what Condo Build is really for is abstracting that away and taking advantage of Condo environments to make that setup be very seamless.

26:06 And so you list your requirements at build time and at runtime in that YAML file.

26:12 And then Condo Build goes and installs that stuff.

26:16 And it is also doing a lot of kind of housekeeping as far as downloading the source and making sure that it's clean and then putting it in the right place and putting your prompt in the right place and activating VC Verazol, for example.

26:32 And just taking care of all of those details for you so that you don't have to learn to do it yourself or don't have to put up with doing it yourself.

26:40 That's nice.

26:41 So if I can get my thing to build on one platform, how hard is it to get it to build on all the platforms?

26:47 It depends heavily on which package.

26:50 I mean, some packages support all platforms pretty well, pretty natively.

26:55 But for example, some packages that really depend very heavily on Unix tools take a little bit more handholding on Windows.

27:02 There's a lot of other build tools that make life easier.

27:05 So for example, CMake is something that you always get really happy when you see that somebody's using that because it makes it much easier to do cross-platform builds.

27:15 Whereas some other projects will say, oh, I've got a make file for Unix stuff, so Linux and Mac.

27:21 And then on Windows, I've got this Visual Studio 2010 solution file.

27:27 So if you're on Visual Studio 2008, you probably can't use it.

27:31 If you're on 2015, you have to update it.

27:33 And maybe it works.

27:34 Maybe it doesn't.

27:35 So that's the kind of thing where some projects are a lot easier than others.

27:41 I would add, though, that for Python in particular, the recipe is usually trivial.

27:47 It's usually mostly just there's usually about three files.

27:51 There's a meta.yaml that is kind of like setup.py, except we use setup.py.

27:55 And then your build script is mostly just Python setup.py install.

28:00 And most of the time, you don't even include that.

28:04 If you look at most of the Python-based recipes on Conda Forge, there's just a single command to run that setup.py command in the meta.yaml file.

28:13 I see.

28:14 And that gets basically installed into that local Conda environment.

28:18 Then you just grab what you got and ship it?

28:20 Exactly.

28:21 So what Conda Build is doing is taking a snapshot of the files after it creates the environment.

28:25 And then another snapshot after it's done doing whatever you told it to do to install it.

28:31 And then those are the files that make up your package.

28:34 Okay.

28:34 This is excellent.

28:35 Yeah.

28:35 So it sounds like the Conda environments play a super important role.

28:40 And the fact that Conda can install the tool chain as well is really important in making this all work.

28:46 Yes.

28:47 It's not critical.

28:48 So the tool chain being a Conda package is actually something that we're working on.

28:52 You'll soon be able to install the compiler and the runtime libraries and all that stuff as dependencies.

28:59 You can currently, just not very many people do it because it's not supported all that well.

29:03 That's what we're moving towards our main way of doing things.

29:07 And what that'll mean is it's just going to be incredibly easy to, you're just going to be able to volunteer your machine as a build worker.

29:15 And as long as you have Conda build installed on it, it doesn't matter what else you have.

29:19 It'll take care of it.

29:20 Interesting.

29:20 So you mean like I could go and just say, hey, I'm willing to donate some of my cycles and disk and bandwidth to be a build machine?

29:27 Kind of like steady at home or protein folding, those things?

29:31 Yes.

29:31 That is the dream.

29:33 And there's quite a few technical hurdles to get there, but it'll be really neat if we figure it out.

29:37 That sounds awesome.

29:38 Yeah, yeah.

29:39 Very cool.

29:39 Okay.

29:42 So what are some of the challenges around, you know, linking all of these other GitHub or not necessarily GitHub, Git repos into like one sort of super master feedstock repo?

29:55 You know, because you've got all the recipes, but they all either in the Conda or Conda Forge, like kind of link back to all these other places.

30:03 Phil, do you want to answer that or should I?

30:05 You're welcome to answer it.

30:07 Okay, good.

30:08 I think the hard part is how do you do the maintenance work?

30:11 Because it works fantastically well to pull those submodules into one folder just as a view or as a consolidated place to work from.

30:21 But then if you change any of the source code, Git submodules are just unwieldy at best for pushing those changes back to the parent repositories.

30:33 So what Phil has created is fantastically useful for doing the editing on each repository.

30:41 But if there were like, say, a lot of edits you wanted to apply across a lot of different repositories, that's the kind of change that would be kind of tedious and hard to do.

30:51 Yeah, absolutely.

30:52 So what's that tooling look like?

30:53 We don't have any idea yet.

30:55 We're still working on it.

30:57 Just waiting for Phil to work his magic.

30:59 Yeah, yeah.

31:00 Please, Phil, figure that out.

31:01 So on the Condor Forge front, we actually have a Heracuse service that's running periodically to re-render all of our feedstocks whenever we make kind of fundamental template changes to how a feedstock should look.

31:15 And that service actually goes away, re-renders using the tools that we've built.

31:20 And then if there are any changes, pushes a new pull request to be merged by the recipe maintainers on GitHub.

31:29 So actually, it is tedious to have to make these changes to, like, we're up to 1,500 Git repos now.

31:37 It is tedious if you need to make those changes, but there are some tools that we've developed to simplify that if it's kind of a universal change that's needed.

31:44 Yeah.

31:44 To clarify what a re-rendering is, that is just, like, a change to the CI setup work.

31:52 And the re-rendering is to adapt the CI scripts to whatever the latest standard is.

31:57 I see.

31:58 Okay.

31:58 Yeah.

31:58 And what CI tools do you use?

32:00 That's the AppVeer and the CircleCI and TravisCI.

32:05 Okay.

32:05 Yeah, yeah.

32:07 Very nice.

32:07 And when do you choose which, or do you use them all and sort of somehow use them in concert?

32:13 Sure.

32:14 That's something that Phil figured out, and he uses AppVeer for Windows and CircleCI for Linux and TravisCI for Mac.

32:21 Phil, you want to explain why you made those choices?

32:24 I mean, the choices were pretty straightforward, really.

32:26 So, obviously, AppVeer is pretty much the only game in town for Windows continuous integration, or certainly was at the time.

32:33 TravisCI is ubiquitous, and everyone knows it.

32:36 And at the time, when we first developed CondorForge, there was either you could have very limited Mac builds or amazing Linux builds.

32:45 But you can have both.

32:47 So, we ended up having to go down the route of picking Travis for Mac.

32:59 And then, yeah.

33:11 And then, yeah.

33:13 You know, we've got to go down the route of the app, but we've got to go down the route of the app.

33:25 And then, yeah.

33:26 So, yeah.

33:30 So, yeah.

33:31 Yeah.

33:31 That's really great of them.

33:32 Yeah.

33:33 Yeah.

33:33 Nice job.

33:35 Do you think you guys, you think, on a forge or the recipes, Feedstock is what it's called, is the thing on GitHub with the most submodules?

33:46 Oh, good question.

33:47 Thanks.

33:48 Yeah.

33:50 We're up to 1,500 recipes.

33:52 So, about 1,500.

33:54 It's pretty high.

33:55 It's got to be in the top five if it's not number one.

33:58 I bet it's number one.

33:59 That's really interesting.

34:01 Really interesting.

34:03 So, can you talk about the growth and some of the challenges, Phil, of CondorForge?

34:07 Sure.

34:08 So, I mean, we publicly announced CondorForge less than 12 months ago.

34:12 And the uptake curve on it was crazy.

34:17 In the first few months, we'd grown to 100 contributors and kind of 300 recipes or something like that.

34:24 It really is quite amazing that the infrastructure scaled so well.

34:28 I mean, we designed the architecture to have the ability to make lots of Git repos.

34:34 But we had really no idea whether GitHub were going to limit the number of Git repos we could have under one organization.

34:40 You know, the continuous integration services, we had no idea whether they'd be able to cope with so many kind of registered Git repositories.

34:47 There were some real challenges that actually just fell out quite nicely.

34:52 Yeah.

34:53 It sounds like the tooling worked really well, actually.

34:55 Yeah.

34:56 As it became more and more mature, the bottleneck, it turns out, has not been the hardware.

35:01 Actually, the biggest bottleneck probably is me.

35:04 And just kind of developing the governance and the kind of delegated authorities to kind of make decisions without my intervention just takes time.

35:16 And it's actually kind of the fluffy thing that's much harder than the software for me.

35:20 There's no script for that, is there?

35:22 No.

35:22 There's no continuous integration service for that.

35:26 Hey, everyone.

35:26 Let me take just a quick moment and tell you about a new sponsor of the show, MongoDB University.

35:31 MongoDB is one of the fastest growing job skills on the market.

35:34 Long ago when I was getting into MongoDB, I took one of their free courses on Mongo University.

35:39 It was a great way to get up and running with my first app.

35:42 MongoDB University offers free seven-week courses on MongoDB designed to teach you everything you need to know about how to build a MongoDB-based app.

35:50 This course will cover basic installation, JSON, schema design, querying, inserting data, indexing, and working with the Python driver, of course.

35:58 After completing this course, you should have a good understanding how applications are built on top of MongoDB using Python.

36:05 Plus, you'll have a great foundation for preparing for the MongoDB developer certification exam.

36:09 I hope you can join me as a MongoDB University alumnus.

36:12 Sign up for the free seven-week course at talkpython.fm/Mongo.

36:18 Kale, let's talk a little bit about some of the projects under github.com/Conda.

36:23 Sure.

36:23 So there's a few that are interesting to me.

36:26 There's Conda and then there's Constructor.

36:28 What's the story of those?

36:30 Yep.

36:31 So there's Conda and CondaBuild.

36:32 Those are the two big ones that we've been talking about.

36:35 There's Constructor.

36:36 And Constructor is actually a lot of what we use to build the single installer, the Anaconda Distribution installer, at least the .sh ones.

36:47 And so it creates a self-extracting binary into a single .sh or executable file will unroll itself and install all those packages for you.

37:00 That's what Constructor is.

37:01 Some of the other ones, CondaM is on there, but it's deprecated.

37:05 It's been pulled into the Conda repo now so that they can be tested together and shipped together and versioned together and stuff like that.

37:12 There's Capsule repo.

37:14 And Capsule is a bit of an experiment for Continuum right now.

37:18 It's really focused on more managing data science projects, starting services that are dependencies for data science projects like Aretis database or something like that.

37:30 So it takes all the ideas of Conda's package management, environment management, adds services, adds running processes to it, and then adds data to it as well.

37:41 So data shape, schema, stuff like that.

37:44 Wow.

37:44 That sounds pretty interesting.

37:45 So if I want to do something where I'm going to need a Jupyter Notebook server running, I'm going to need Redis, like you said, and a few other things, that could sort of do that all in one shot?

37:55 And then in a portable way so that you can move projects around from a local dev environment to something that is sort of running in production and against your production database.

38:06 Even Conda, one of the great things is it's completely OS agnostic, right?

38:11 So being able to port your projects across OSs and share your projects that way too.

38:17 Cool.

38:17 How does that help for reproducibility?

38:19 Like if I'm a scientist and I've written a paper, can I make a capsule thing and go, and this is the thing that will let you go and do it?

38:25 I can see that happening.

38:26 Yeah.

38:26 Yeah.

38:27 Very, very, very cool.

38:28 The starting point, if you want to examine my data yourself, here's all the tools you need to do it.

38:35 Run a command and you're in.

38:37 Okay.

38:38 Excellent.

38:39 And Phil, how about the Conda Forge Conda Smithy thing on GitHub that you guys have?

38:44 What's that?

38:45 Right.

38:45 So as I say, we've got like 1,500 recipes now that are sitting each in their own Git repository.

38:50 And the Git repository that holds the recipe is called the Feedstock.

38:54 And as you can imagine, there is duplication of things like the readme and the license and the continuous integration scripts.

39:02 And we didn't want to have that kind of duplication where we had to manage that manually.

39:08 So Conda Smithy really is the tool that is the cookie cutter, the templating engine that takes a recipe and turns it into this Git repository.

39:18 I see.

39:19 So if I owned a package and I wanted to get started, maybe I'd check out Conda Smithy to build what I'm going to start with?

39:25 Yeah.

39:25 Yeah.

39:25 You can render your recipe into a feedstock and see how that would look.

39:30 And actually, that's precisely what we do.

39:33 The turn recipes proposed to Conda Forge into feedstocks.

39:39 And the tooling in Conda Smithy also includes the ability to make the new GitHub repository.

39:44 It registers all the webhooks.

39:46 So you can kind of automatically register Travis, Circle, and AppFire continuous integration.

39:52 It just kind of packages everything neatly into one place so that we don't have to repeat ourselves 1,500 times.

40:00 Yeah, absolutely.

40:01 And make it easier for people to contribute.

40:03 All right, guys.

40:04 Looks like we're running out of time.

40:05 I think we might have to leave it there for Conda.

40:08 But let me ask you each two questions.

40:11 I always ask at the end of the show.

40:12 Just there's three of you.

40:14 You go kind of quick.

40:15 First of all, there's over 90, I think 95,000 packages on PyPI and many of them on Conda Forge and within Anaconda.

40:24 Can you each give me maybe one of your favorite packages that's not necessarily popular but you think is really cool that people should learn about?

40:31 Phil, go first.

40:32 I'm going to pick a package which is extremely popular but it's so exciting.

40:36 I'm thrilled that it exists.

40:38 My favorite package is Dask and Dask Distributed.

40:41 I think it's just an amazing solution to a really big problem within the community.

40:47 Yeah, so this is like for parallelizing data pipeline type processing?

40:50 Right, exactly.

40:51 Yeah, and parallel array computation.

40:55 Just really exciting.

40:57 All right, Dask.

40:58 Dask, very cool.

40:59 Cale?

40:59 Yeah, the two packages that popped into my mind are Mahmoud Hashimi's Boltons.

41:04 I love that.

41:05 I use it all the time.

41:06 And then Matt Rockland's PyTools, SciTools.

41:10 Both fairly low level but both awesome libraries for writing general Python and using both all the time.

41:17 Oh, excellent.

41:18 Yeah, I've checked out Boltons.

41:19 Not the other one.

41:20 I'll have to have a look at that.

41:21 They're both, you know, both sound great.

41:22 And Michael?

41:23 pytest.

41:25 Yes, pytest.

41:27 To make sure all these automated builds actually have some sort of meaning, right?

41:30 Yeah.

41:30 And also just all the plugins that go with pytest.

41:33 Let it do incredible things in very expressive ways.

41:37 I'm very, very thankful for pytest.

41:39 Excellent.

41:39 All right, Michael, when you write some Python code, what editor do you open up?

41:42 SpaceMax.

41:43 SpaceMax.

41:44 All right on.

41:44 Cale?

41:46 For Python code in particular, it's PyCharm.

41:48 PyCharm.

41:48 Okay, cool.

41:49 And Phil?

41:50 I'm mostly a Vim user, even on Windows.

41:52 Okay.

41:52 Yeah, yeah.

41:53 Very nice.

41:53 And I guess the legitimate real Vim and the Unix tools are supposed to be there soon.

42:00 Are they already there?

42:00 I'm not sure.

42:01 But the Unix...

42:02 In Conda?

42:03 No, no.

42:04 In Windows 10, it's supposed to have the Ubuntu subsystem in it natively pretty soon.

42:10 Maybe that'll even help with some builds.

42:11 Who knows?

42:11 But, yeah, yeah.

42:13 Very cool.

42:13 Do we have Vim for Conda?

42:15 Has it been written?

42:16 Does anybody know?

42:16 I'm just curious.

42:17 I don't know.

42:18 I wrote Nano a long time ago.

42:21 Is it on Conda Forge?

42:22 Phil, do you have Vim compiled yet on Conda Forge?

42:25 I know of.

42:26 But then, you know, I'm completely out of the loop.

42:28 Sounds like somebody should take that one on.

42:30 Awesome.

42:30 All right.

42:32 Any final call to actions?

42:33 Like, how can people contribute to this overall project that you guys are all working on?

42:36 Vim for Conda Forge, apparently.

42:38 That's step one.

42:39 Oh, I have one.

42:40 I released Conda 4.3.0 today.

42:44 It's a huge feature release.

42:46 It's over almost 900 commits past the 4.2 feature release or feature branch.

42:53 It's released on.

42:55 We have a Conda Canary program.

42:57 So just like Chrome has a Canary program, I kind of ripped off the title.

43:00 Plus, there's some alliteration.

43:01 So Conda Canary.

43:02 And you just add a Conda config, add channels, Conda Canary.

43:07 You add the Conda Canary channel.

43:09 Check it out.

43:10 Use it.

43:11 Let us know what you think of it.

43:13 The changelog is extensive.

43:14 And we'll be releasing it, general availability, probably on the last day of 2016.

43:20 Oh, perfect.

43:21 A nice New Year's gift.

43:22 That's awesome.

43:22 So yeah, people check that out.

43:23 Anything else?

43:24 I'd say if anybody is missing a package on Conda that is in PyPy and it would make your life easier,

43:31 then please submit a recipe to Staged Recipes.

43:35 And the main thing I would emphasize about Conda Forge is the really incredible thing about it is the community.

43:41 There is such an amazing pool of expertise there on how to build software.

43:45 And I've learned a lot.

43:47 And anybody stands to learn a lot just by getting involved there.

43:50 Okay, excellent.

43:51 Yeah, absolutely.

43:52 Easy to contribute and please do so.

43:54 Any of you guys going to be at Anaconda Con in Austin in February?

43:58 Oh, yes.

43:58 Kale?

43:59 We'll be around.

44:00 Yep, absolutely.

44:00 Nice.

44:01 Phil, are you traveling like halfway around the world?

44:03 Unfortunately, I won't be there for Anaconda Con this year.

44:06 All right.

44:06 So if people make it, go say hi to Michael and Kale and we can tweet.

44:10 Tweet at Phil.

44:11 All right, guys.

44:12 Thanks so much for being on the show.

44:13 It's been great to talk about Conda.

44:14 I think it's a really great project overall that you guys are working on.

44:18 So thanks for that.

44:18 Thank you, Michael.

44:19 Thank you.

44:20 This has been another episode of Talk Python to Me.

44:24 Today's guests have been Phil Ellison, Kale Franz, and Michael Sarahan.

44:29 And this episode has been sponsored by Continuum Analytics and MongoDB University.

44:34 Thank you both for supporting the show.

44:36 Whether you want to hear the keynote by Ryan Curran from Forrester Research,

44:40 meet the guys behind Anaconda, or just mingle with high-end data scientists,

44:44 you need to find your way to Austin, Texas for Anaconda Con this February.

44:47 Start at talkpython.fm/acon.

44:51 Get the skills you need to build your Python apps on top of the most successful

44:55 and in-demand document database at MongoDB University.

44:58 Take a free class by visiting talkpython.fm/mongo.

45:02 Are you or a colleague trying to learn Python?

45:05 Have you tried books and videos that just left you bored by covering topics point by point?

45:10 Well, check out my online course, Python Jumpstart, by building 10 apps at talkpython.fm/course

45:16 to experience a more engaging way to learn Python.

45:19 And if you're looking for something a little more advanced, try my WritePythonic code course at talkpython.fm/pythonic.

45:26 Be sure to subscribe to the show.

45:29 Open your favorite podcatcher and search for Python.

45:31 We should be right at the top.

45:32 You can also find the iTunes feed at /itunes, Google Play feed at /play,

45:38 and direct RSS feed at /rss on talkpython.fm.

45:42 Our theme music is Developers, Developers, Developers by Corey Smith, who goes by Smix.

45:47 Corey just recently started selling his tracks on iTunes, so I recommend you check it out at talkpython.fm/music.

45:54 You can browse his tracks he has for sale on iTunes and listen to the full-length version of the theme song.

45:59 This is your host, Michael Kennedy.

46:01 Thanks so much for listening.

46:02 I really appreciate it.

46:03 Smix, let's get out of here.

46:10 I'll see you next time.

46:27 Bye.

46:28 you you

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon