Monitor performance issues & errors in your code

#94: Guarenteed packages via Conda and Conda-Forge Transcript

Recorded on Tuesday, Dec 6, 2016.

00:00 Have you ever had trouble installing a package you wanted to use in your Python app? Likely it contains some odd dependency required a compilation step, maybe even using an uncommon compiler like Fortran. To try this on Windows. How many times have you seen cannot find VC bars all dot bat before you have to take a walk. If this sounds familiar, you might want to check out the conda, package manager, Anaconda distribution, conda, Forge, and conda build. Together, these dramatically lower the bar for installing packages across all the platforms. This week, you'll meet Phil Ellison, kale, Franz and Michael sarahan, who all work on various parts of this ecosystem. This is talk Python to me, Episode 94, recorded December 14 2016.

00:48 in many senses of the word because I make these applications balls and use these words to make this music instructed to think when I'm coding another software design, in both cases, it's about design patterns, anyone can get the job done. It's the execution that matters. I have many interests.

01:07 Welcome to talk Python, to me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy, follow me on Twitter, where I'm at m Kennedy. Keep up with the show and listen to past episodes at talk python.fm and follow the show on Twitter via at talk Python. This episode is brought to you by MongoDB and continuum analytics, thank them both for supporting the show by checking out what they have to offer during their segments. Kill Michael Phil, welcome to talk Python, guys. It's great to have you here. Thanks, Michael.

01:39 Thank you.

01:41 I'm looking forward to talking about conda it's gonna be really fun. There's a whole ecosystem growing up around it. And I think it's a little bit underappreciated just from the people I talk to you. And so I really want to spread the word about how cool conda is and what you can do with it. So we'll get to all that. But let's start with you guys. Sorry, how do you got into programming? Since there's three of you? Let's kind of keep it a little bit short. But just kale maybe how do you get in programming? Python? 30 seconds? Yeah,

02:07 I in grad school, I was writing a lot of MATLAB code. And then Matlab is proprietary. And this thing called NumPy, came along and started using NumPy. And it kind of grew from there.

02:18 Oh, that's really cool. Now you're at the very center of the nucleolus of NumPy. And all that kind of stuff. That's great. Michael, how about you?

02:27 Yeah, I started out with a web program to show my photos to my parents and family. And then grew into science stuff when I went to grad school much like kale. So

02:38 yeah, nice. Would you said in grad school?

02:40 electron microscopy? electron microscopy? Okay. Wow,

02:43 very cool. Lots of very small things. Yeah, it was that like image recognition and things like this.

02:50 Yeah. And also spectral processing. So all kinds of fun signal processing and unsupervised learning. Oh, okay. Very cool. Phil,

02:58 how about you does that pretty young, and started with Visual Basic, and when I was about 14, and then at university, I did quite a bit of maple, which is kind of a mathematical language, pretty similar to sin pi, and then picked up Python when I was in my gap year in France, to have just loved the open source community, not the scientific Python space. And one thing that, you know, that's excellent. So you were kind of bumming around Europe, and most people just hang out and chill. And you picked up Python. Very cool.

03:30 Yeah. Awesome. Okay, so let's get into conda. First, and then we can get into kind of build the kind of forge and the various things around it. So can you tell us what is conda?

03:42 conda is a package manager, install, wherein and use you software, it does a lot more than just manage packages. It manages environments. So there conda in or there's an environment manager and the concept of environments, much like Python virtual ins, except it's it's system level. And it's Python agnostic. The environment part, oh, it's a language agnostic. So you can have environments for I mean, if you want to build Ruby with it, or if you like, whatever types of environments, you want to clearly support it. So yeah, it's text manager.

04:16 Very nice. It's kind of like pip in some of its role, but it does more, right. In some ways,

04:22 it does more, in some ways it does less. So conda. The in the conda ecosystem is not the canonical source of truth for Python packages, that's pi pi. And we don't install packages directly from pi pi. That's his job. We don't pretend to but we are much more like a in some ways more like a system package manager. So more like apt get or yum or something like that. A lot of times with a system manager, you will install Python or you'll install some of your core dependencies. The harder things that pip will invoke a compiler for and things will crash or, or things like that. I wheels help now but pip still getting stuff done. So things that are that are a little bit lower level is where conda really excels at. And then, so yeah, that's the that's the trade off and capabilities, I

05:09 guess. Sure, in the sci fi, I got the origins correctly, I'll tell you what I think of you tell me whether I'm right or wrong. So I feel like conda came from from sort of two places, it feels like it came from you guys wanted to make it easier for data scientists and people more correctly, people working with the data science and scientific tools like NumPy, and so on, to actually start working with them, because they often had really freaky builds. So like, they might depend on some sort of Fortran thing. And so in order to pip install it, you have to have like a Fortran compiler compiler. Yeah. Which, I mean, we're already battling with Where's VC varnish dot bat again, on Windows, and now you're talking about Fortran compiler has come on. So is that sort of where it came from? And it had to manage all these different things like Fortran, for example?

05:54 Yeah, that was the genesis and and conda came about, about the same time as wheels came about. So there's a little bit of two different approaches. But they've sort of diverged into really, pip and wheels and things being Python specific and the source of truth for Python. If there's not a more downstream distributed, whereas conda, we do take care of all the hard parts. So NumPy, and sci fi and all those dependencies, and then Python and all the runtimes, and stuff like

06:26 that. Yeah, one of the big differences is that you guys pre build the binary

06:34 pieces that we might need, right? Yeah, that's correct. So a lot of times people ship to production and absolutely do not want compilers on their production systems. And so we do ship binaries.

06:44 Yeah, I find that to be really excellent. Because, you know, like, if I go and get sequel alchemy, or I think pi, Mongo, even some of these things that have see speed ups at various places, often for like D serialization, serialization steps. And if I don't have the right setup, you know, maybe it's not going to work, especially if I'm on Windows for some reason that gets a little harder to do and gets neglected, often assumed makes it like doubly hard, right? And sure, those things you'd have to worry about if I conda install it, right? Because it's already been compiled somewhere else by you guys.

07:14 Not only do you not have to worry about that, but you can conda install Postgres or all of your other general dependencies. It's not just Python. We have a huge our ecosystem. There's plenty of other ecosystems that people are Language Interpreter ecosystems that people are building up around. Okay. And I guess another point about conda, versus a system package manager, like Yum, or apt get is for your application code. You rarely want to be running the system Python, you always want to have an application specific Python and OS package managers don't make that as easy. You basically have to make your own RPM or something like that, that conda allows these isolated environments.

07:53 Yeah, it absolutely does. That's very cool. Very cool. Okay, so how does conda relate to Anaconda?

08:00 Sure, Anaconda is a distribution. There's an anaconda distribution, that is a set of packages that we ship about every quarter. That is the easy way for, for scientists and engineers, data scientists, PhD students all over the world to install most of the core Python packages that they'll need to do that, and the package manager parked in his conda. And so you get conda. And then when you need to install other packages from any of our repositories, or Anaconda. org, or anything like that conda is the command to extend the Anaconda what you install with the Anaconda distribution.

08:37 Okay, so Anaconda is like all that stuff, pre packaged, precompiled, like the major tools you want, and conda is the package manager for it. Nice, Phil, How about conda? forge? What is conda forge relates all this.

08:50 So if you see Anaconda as as a nice bundled set of packages, to install Python and all its dependencies, then conda forge is essentially doing the same thing. Except Anaconda is managed by continuum. And conda. forge is a community effort to kind of collaboratively to package this stuff.

09:10 There's so many packages out there, right? I mean, just like he's using conda is more than just Python. But just in Python, there's over 90,000 packages on pipe here, which is crazy, right? So you probably can't manage all of that yourself. Right? No, so conda forge has to help crowdsource that open source that a little bit. Yeah, exactly.

09:29 And it's not reasonable for us to expect continuing to package all of this stuff. It just simply doesn't scale in that way. We needed a community effort to package some of the things that continue them aren't necessarily ever going to package at the end of the day, their business and that that supply that customers and that doesn't necessarily cover everybody who has very specific patches that need to be shipped.

09:55 Yeah, absolutely. There's probably a very quickly, small, nice tail. That's sort of develops, right? There's probably 500 or thousand packages that are super important. And then it becomes really nice quickly after that, right? Absolutely.

10:08 That's where conda forge started. Really. I'm the author of several niche packages, and which are extremely powerful for the people that are starting. But by no means would you expect that to be packaged by a company who were there, ultimately to sell a product to their, to their customers. And that product may be freely available. But then we don't have kind of as a community, we don't have control of what's packaged,

10:34 and how it's packaged. Sure. Yeah, that makes a lot of sense. So I guess it didn't really ask what is conda Forge, maybe give people a quick definition, so they know what we're talking about.

10:44 Okay, so conda forge in the same way. Anaconda is a bundle of precompiled conda packages. And conda forges is exactly the same thing as it is a channel that you can enable using the conda package manager to install all of your favorite tools. And if your tools aren't available on that channel, then it's a community effort. And you're welcome to contribute your package to to be bundled into the con forge community and make it widely available to Condor users.

11:18 Right. Okay. Yeah, well, we'll definitely dig into that, that more and how you can do that in a little bit. But that's great. And so in order to deliver things in a binary form, basically, compiled for Windows compiled for Linux compiled for macOS, and things like that, you guys probably have some pretty interesting build infrastructure. Michael, you want to talk about that? Oh, sure.

11:42 Yes. So we have a lot of different build machines. And so one of the main advances that conda forge has done is to take advantage of many of the free continuous integration services that are on the web. And so Phil figured out really, really clever ways to get around the different limits that those CIA's present. So he thought, Well, you know, it's not going to work to build an entire repository of recipes. So can I instead, break up the one repository into one recipe per repository, and that was a really clever workaround that made it possible to use those cis as infrastructure. So that's part of it at continue, we just have kind of a standard build system where we try to maintain compatibility with older frameworks, because we support customers with with old CentOS five and REL five systems. But other than that, it's the

12:40 same kind of idea. Okay, and you build it for I can see how you would set up ci on Windows pretty easily. I see it set up on Linux really easily. What about Mac?

12:50 So Mac is

12:53 got a line that laugh,

12:54 right? Yeah, well, it's a little bit more awkward. And the main reason why it's awkward is just Apple licensing. They don't let you run virtual machines on anything but Apple hardware. And so the number of people who are offering particularly free services with Apple builds is much more limited. That said, you can get it on both circle ci and Travis CI. And so we use Travis CI for conda. forges Mac builds.

13:18 Okay. Yeah, I mean, there are a few places like there's a place called Mac Mini colo, where you can get a bunch of Mac minis and stick them in some data center. And they'll like run them for you. But yeah, it definitely makes it harder. But you guys found some kind of cloud service that you can use, you don't have like a closet full of Mac's

13:34 I continue, we do have a closet full of Mac's. Okay,

13:37 great. But I you know, I really think it's this, this binary distribution thing is really excellent. Because certainly when you're getting started, it's frustrating to deal with, you know, the single compile the single install. So just knowing that it's always going to work is great, but also the speed, right? You can install stuff much quicker, because you don't have to wait on a build.

13:58 Yes, certainly. So it's, you know, if you're a Linux nerd, and it's kind of Gen two versus you boon. Two, right? Yeah.

14:05 Yeah, very nice. So you have a lot of different places where code running, we talked about the three major OS is like, do you have compatibility for others? You talked about older systems, for example. How do you keep that all straight?

14:19 Well, that's, that's a really hard thing, actually. So the the way that things are compatible varies by platform. And so for Mac, we kind of have to target an older platform, and then it's forward compatible. Windows is a weird story, because on Windows, the Python version is pretty strongly tied to a particular version of visual C++. And so that's limiting in terms of what kind of code people can actually compile. So what version Dino? Sure, yeah. So Python two, seven requires Visual Studio 2008. And that's, it's just kind of a custom because that's what the upstream python.org build is. And so in order to maintain binary code, compatibility, it's a good idea for everybody to keep that match up. But as a result of that people can't really build C++ 11 software for Python two, seven, without mixing these runtimes. And when you mix runtimes is just kind of it might work. It might not, you're kind of asking for instability. And no kidding is too bad because C++ has actually had a bit of a Renaissance. Right? Absolutely. And 2008 misses that Renaissance. That's too old for it. Okay, interesting.

15:29 Interesting. Can you any guys want to jump in on this? So talking about some of the challenges for community packaging? Like, if I'm getting binary code delivered to me, how do I trust it is that really different than trusting the source coming out of pi pi?

15:44 Sure, I'll chime in a little bit, I would say it's not any different from getting it from pi pi, except that if you get it from pi pi, you can at least inspect the source. Whereas if you've got a binary blob, you have to have a different level of trust. I think there's one extra level to that because pi pi it is it is the original package maintainer that's uploading the package to pi pi, right? We're taking from pi pi. And we're downstream from that. So there is an extra level of people in between. On top of that anybody can upload anything, any package they want to to Anaconda. org, it's up to the user to decide whether they want to use it or not. And and so there there, there are some challenges there.

16:25 There's also a question of reproducibility in the sense that, if you're getting the source, you can compile it anywhere, right? But if you're getting a binary, and you want to go and change your machine, then you kind of, you're missing out on the ability to rerun that compile step,

16:40 where you choose. Yeah, so if you get a new machine, you're always gonna have to go back to install via conda. You can't just copy the files over and regenerated or something.

16:50 If you want different architectures, for sure. You need to go and refresh the appropriate artifact for your hardware.

16:56 Yeah. Okay. That said, do you guys do? You know, I had Travis Oliphant on one of the earlier shows. And I feel like he said that you guys did some verification that, at least is stuff that ships with Anaconda, like, mixes well together, that one package that depends on another, you know, those are compatible versions.

17:18 So Right, right, we do a lot of that. And so we have our own kind of automated scripts that test our whole system together. And then conda, forge sort of does that in a more distributed way to say, when they build a package, they run the test suite. We also do that we run a few other consistency checks. But those are the kinds of things that let us make sure that everything is playing well together. I that there that the Anaconda distribution is extensively we release it about quarterly and it's extensively tested manually, we have a whole QA team that goes through that, and make sure that each package works with or each each of the packages, however many that make up the full distribution work together. So yeah, there's a lot of extra testing that goes into the distribution. Yeah, I

18:02 think that's really great. I think that's one of the big values, not just that you don't have to be able to compile stuff to install it, but that it's taken as a whole, somewhat, you can trust it rather than here's 100 little pieces. Individually, they're probably fine. How are they as a put together, right?

18:17 Yeah, the downside of that is when you install the Anaconda, distribute conda install Anaconda, you're explicitly pinning all of your versions, or all of your packages to a specific version. And then when you want to update everything people get confused about, I was looking for the most recent version of Jupiter. And I said, update all Why didn't I get the most recent version of Jupiter. So there's

18:38 a little bit of a downside, that is a bit of a down downside, I guess I didn't notice that like I want. One of the things I was playing around with was trying to use conda for my web apps. I noticed some of the web frameworks were farther out of date with what I got out of there, then I guess I'd like that. Can you talk about that? Whoever has the best info, I guess, on? How do we like as things evolve? You know? So suppose I've got some package put into kind of forge? How does the versioning on pi pi map into conda? forge? Yeah, I'll,

19:11 I'll try to shoot that it's a manual process to update the recipe that creates the conda package. And so someone has to update the recipe on on the stage recipes or on the feedstock on conda Forge, and then it's built out automatically by the seeis, which is great. The ci save you a lot of effort that continues we don't have our cis quite working yet. And so it's really a much more manual process to edit the recipe and then build out each of the different packages. With that level of effort. It's just a matter of first noticing that there is a new version available and then doing the work to package it up. Sure

19:51 in the defensive Anaconda versus conda forge here, there is clearly a huge overlap between what Anaconda does and what conda forge does, from someone who's done not part of the Anaconda team, I just want to say how much of a great job the Anaconda team actually do in providing a really coherent, well put together set of packages that you can trust, you can rely on the stable. And really, I guess that's the big selling point of Anaconda and conda. forge really isn't trying to be in that space at all. What it wants to be is the kind of the leaner, faster moving community effort. But there is no chance that conda forge can be as well curated. They can provide the identity that you get an anaconda as a distribution Anaconda is is just an amazing resource to have in the community.

20:41 Yeah, and it ships quarterly. So it's probably, you know, if you're willing to wait an extra month, you'll be on the new version of a lot of things anyway, if I'm an author of a popular open source package that's either in Anaconda or conda. Forge, what can I do to make sure that my latest possible version is there, given that, you know, I have to accept the release cycles and stuff, but what do I do? Is there their place I talk to you guys, or what's

21:06 the story, we both have places to submit pull requests to recipes. And so for, for the internal for Anaconda. It's a recipe or a repository called Anaconda recipes. And then for conda forge is going to be the feedstock for whatever you want to improve. If you want to add it. That's a sage recipes. Er, absolutely. Okay, so

21:29 let's, let's maybe talk about the the recipes and feedstocks and stuff a little bit,

21:33 I have one more comment that I might have made it giving the wrong impression about the distribution shipping quarterly, the distribution is like one installer, one big meta package that has all kinds of stuff in it, we're constantly updating, I mean, it kind of works, dude, but our source our default repositories where we're updating all the time, and you don't, you don't have to wait for the quarter to come around to to get more recent packages. Okay, the Anaconda distribution and the big Anaconda meta package is updated once a quarter, those are the packages that get all the extra QA and, and making sure that they all put together

22:09 and stuff. I mean, so like the dmg I got from my Mac, that thing is updated quarterly. But you're right when I run a little green circle thing, the sort of overall environmental manager, that thing I can go and update stuff as they come. I remember doing that quite quite more frequently. That's

22:27 cool.

22:30 Let me take just a moment and tell you about a sponsor of this episode. In a comic con 2017 is the first conference for Open Data Science leaders around the world and the definitive gathering place for the Anaconda crew. Whether you're a new or long standing member of the community focused on business or technology in a comic con will help you conquer your biggest data science challenges. Data Science is a team sport. That's why they're offering you two tickets for the price of one to Anaconda con 17. Starting now, until January 16. Register today at talk python.fm slash a con to take advantage of the spectacular savings, you'll get two tickets for the price of one. And you'll have the lovely tech oriented city of Austin, Texas to share with your friend and all the top data scientists. Yeah, so let's talk about these recipes just a little bit. So let's focus on can't afford one minute if you go there and go to github.com slash conda dash Forge, I think. And yeah, you've got a couple a couple of repos. One of them is feedstocks, and that's where the accepted. Those are the accepted sources that you guys pull from right to build packages, right? We have a repo

23:39 which pulls using get sub modules, every single individual feedstock repo that lives in conda Forge. So the feedstock is the place where the recipe is stored canonically. And that has the continuous integration enabled. So whenever we make changes to that recipe goes away, and it builds it each time and makes that artifact available on the conda forge channel. Okay,

24:04 maybe tell us what is a recipe? You've got a couple pieces that make that up, right? Sure.

24:09 Mike, do you want to talk about recipes?

24:12 Yeah, recipe is a YAML file for one thing. And it's just kind of a way of expressing the standard of how you tell some how you tell conda build or any other program that interprets that yamo file, where you go to build the source? What are your build options? What are the instructions that you carry out to actually do the source? And then one thing that we're adding in is how do you package the source after you've built it? Because some of the time you want to package different parts of what you've built into different packages. I see that makes sense. So if basically, we go to GitHub, and there's a link to a sub module that gets pulled in for each one of these, if it's somewhere else other than get like subversion Is it still

24:58 possible to stick it Or does it have to be like, copied over to like a GitHub repo or something?

25:04 conda forge is based around the GitHub stuff, but there's no intrinsic limit to kind of build as to where a recipe has to come

25:11 from. Right. Okay. Well, let's, let's talk quickly about conda. Build. Michael, you're also in charge of that, right? are working closely with that?

25:18 Yes.

25:19 So what's the story there, like, if I want to take a package and build it with conda build, I've got to put together the recipe for it to work. But obviously, I have the sources, and then I get it set up. And I give it to you guys. And you ship it off to ci on various platforms to build out the various versions, or what happens?

25:38 Yeah, kind of build is really just a lot of orchestration of environments setup. And so building up a build environment is pretty hard. There's a lot of different pieces that go into that. And so in previous jobs, as a software developer, setting up your workstation takes a day or two. So what kind of build is really for is abstracting that away and taking advantage of conda environments to make that setup be very seamless. And so you list your requirements at build time and at runtime, in that YAML file. And then conda build goes and installs that stuff. And it is also doing a lot of kind of housekeeping, as far as downloading the source and making sure that it's clean, and then putting it in the right place and putting your prompt in the right place and activating Vc virzal, for example, it just taking care of all of those details for you so that you don't have to learn to do it yourself, or don't have to put up with doing it yourself.

26:40 That's nice. So if I can get my thing to build on one platform, how hard is it to get it to build on all the platforms,

26:47 it depends heavily on which package, I mean, some packages support all platforms pretty well, pretty natively. But for example, some packages that really depend very heavily on Unix tools take a little bit more hand holding on Windows, there's a lot of other build tools that make life easier. So for example, cmake is something that you always get really happy when you see that somebody's using that because it makes it much easier to do cross platform builds. Whereas some other projects will say, oh, I've got a make file for Unix stuff. So Linux and Mac. And then on Windows, I've got this Visual Studio 2010 solution file. So if you're on Visual Studio 2008, you probably can't use it, if you're on 2015, you have to update it. And maybe it works, maybe it doesn't. So that's the kind of thing where, where some projects are a lot easier than others. I would add though, that for Python, in particular, the recipe is usually trivial. It's usually mostly just, there's usually about three files, there's a metadata yamo, that is kind of like setup.pi, except we use setup.pi. And then your build script is mostly just Python setup.pi install. And most the time you you don't even include that if you look at most of the Python based recipes on conda. Forge, there's just a single command to run that setup.pi command in in the metadata yamo file I see. And that gets basically installed into that local conda environment, then you just grab what you got and ship it exactly. So it called the build is doing is taking a snapshot of the files after it creates the environment. And then another snapshot after it's done doing whatever you told it to do to install it. And then those are the files that make up your package. Okay,

28:34 this is excellent. So it sounds like the conda environments play a super important role in the fact that conda can install the tool chain as well, is really important to making this all work.

28:46 Yes, it's not critical. So the tool chain being a conda package is actually something that we're working on, you'll soon be able to, to install the compiler and the runtime libraries and all that stuff as dependencies, you can currently just not very many people do it because it's not supported all that well. That's what we're moving towards our main way of doing things. And what that'll mean is, it's just going to be incredibly easy to you're just going to be able to volunteer your machine as a build worker. And as long as you have conda build installed on it, it doesn't matter what else you have, it'll take care of it

29:20 interesting. So you mean like I could go and just say, Hey, I'm willing to donate some of my cycles and disk and bandwidth to be a build machine to kind of like SETI at home or protein folding those things.

29:31 Yes, that is the dream. And there's quite a few technical hurdles to get there. But it'll be really neat if we figure it out. That sounds awesome. Yeah, very cool. Okay.

29:41 So what are some of the challenges around you know, linking all of these other GitHub or not necessarily going to get repos into like one sort of super master feedstock repo, you know, because you've got all the the recipes but they all either Anaconda or conda forge that kind of link back to all these other places.

30:03 So do you want to answer that or should I? You welcome?

30:07 Okay,

30:08 good. I think the hard part is how do you do the maintenance work, because it works fantastically well to pull those sub modules into one folder just as a view, or as a consolidated place to work from. But then if you change any of the source code, get sub modules are just unwieldy at best for pushing those changes back to the parent repositories. So what Phil has created is fantastically useful for doing the editing on each repository. But if there were like, say, a lot of edits, you wanted to apply across a lot of different repositories. That's the kind of change that would be kind of tedious and hard to do.

30:51 Yeah, absolutely. So what's that tooling look

30:53 like? We don't have any idea yet. We're still working on it.

30:57 Just waiting for Phil to work his magic.

30:59 Yeah, yeah, please, Phil, figure that out.

31:01 It's only come to forge from we actually have a Heroku service that's running periodically, to rerender. All of our feedstocks, whenever we make kind of fundamental template changes to how a feature feedstock should look, and that service actually goes away re renders using the tools that we've built. And then if there are any changes, pushes a new pull request to be merged by the recipe maintainers. On on GitHub. So actually, it is a bit tedious to have to make these changes to, like we're up to 1500 Git repos. Now, it is tedious if you need to make those changes. But there are some tools that we've developed to simplify that if it's kind of a universal change needed. Yeah,

31:44 to clarify what a re rendering is, that is just like a change to the the CI setup work. And the re rendering is to adapt the CI scripts to whatever the latest standard is. I see. Okay. Yeah. And

31:58 what ci tools do you use?

32:01 That's the the app vejer and the circle ci and Travis CI,

32:05 okay. And pretty nice. When do you choose which? Or do you use them all and sort of somehow use them in concert

32:13 shirt, that's something that Phil figured out. And he uses at Bayer for Windows and circle ci for Linux, and Travis CI for Mac.

32:21 Phil, you

32:22 want to explain why you made those choices?

32:24 I mean, the choices were pretty straightforward, really. So obviously, there is pretty much the only game in town for Windows continuous integration, or certainly wasn't the time, Travis CI is ubiquitous. And everyone knows it. And at the time, when we when we first developed conda, Forge, there was either you could have very limited Mac builds, or amazing Linux builds, you can have both. So we ended up having to go down the route of picking Travis for Mac. And then the alternative for Linux continuous integration, which based itself off of Docker containers was circle ci, and it was just a breeze to use. The Docker integration was amazing, because it allowed us to set up the tool chain really quickly. And yeah, they've all been fantastic services, they kind of get a bit of a bad reputation for being slow sometimes. And but every single one of those Continuous Integration Services has upgraded from the forge to that premium services for free, just as part of improving the ecosystem and kind of supporting the common forge community. So extremely grateful. Yeah,

33:31 that's really great to them. Yeah. Nice job. Do you think you guys you think on the forge for the recipes, feedstock? Is that what it's called? Is the thing on GitHub with the most sub modules? Oh, good

33:47 question.

33:49 Yeah, we're up to 1500. recipes. It's pretty high.

33:55 It's got to be the top five. If it's not number one, I bet it's number one. That's really, really interesting, really interesting. So that's, can you talk about the growth and some of the challenges Phil have can afford

34:07 Joe. So I mean, we publicly announced on the forge less than 12 months ago. And the uptake curve on on it was crazy. In the first few months, we've grown to 100 contributors and kind of 300 recipes or something like that. It really is quite amazing that the infrastructure scaled so well. I mean, we designed the architecture to have the ability to make lots of Git repos. But we had really no idea whether GitHub, we're going to limit the number of Git repos we could have under the one organization, you know, the continuous integration services, we had no idea whether they'd be able to cope with so many registered Git repositories. There was some some real challenges that actually just fell out quite nicely. Yeah.

34:53 It sounds like the tool and works really well actually.

34:55 Yeah, that's becoming more and more mature. The bottleneck. It turns out has not been the hardware. Actually, the biggest bottleneck probably is me. And just kind of developing the governance and the kind of delegated authorities to kind of make decisions without my intervention just takes time and is actually is kind of the fluffy thing that's much harder than software.

35:20 There's no script for that, is there? No,

35:22 there's no continuous integration service for that.

35:26 Hey, everyone, let me take just a quick moment and tell you about a new sponsor of the show Mongo DB University. MongoDB is one of the fastest growing job skills on the market. Long ago when I was getting into MongoDB. I took one of their free courses on Mongo University, it was a great way to get up and running with my first app. MongoDB University offers free seven week courses on MongoDB designed to teach you everything you need to know about how to build a MongoDB based app. This course will cover basic installation, JSON schema design, querying, inserting data indexing and working with the Python driver. Of course, after completing this course, you should have a good understanding how applications are built on top of MongoDB using Python. Plus, you have a great foundation for preparing for the MongoDB developer certification exam. I hope you can join me as a MongoDB University alumnus sign up for the free seven week course at talk python.fm slash Mongo kale. Let's talk a little bit about some of the projects under github.com slash conda. Sure, so there's like a few that are interesting to me is like there's conda. And then there's constructor. What was the story of those?

36:30 Yep. So so there's conduct found a bill, those are the two big ones that we've been talking about. There's constructor and constructor is actually a lot of what we use to build the single installer, the Anaconda distribution installer, at least, the the.sh ones. And so it creates a self extracting binary into a single.sh, or executable, executable file will unroll itself and install all those packages for you. That's what constructor is. Some of the other ones conda install is on there. But it's deprecated it's been pulled into the Condor repo now, so that they can be tested together and shipped together in version together and, and stuff like that. There's a capsule repo, that and capsule is a bit of an experiment for continuum right now. It's really focused on more managing data science projects, starting services that are dependencies for data science projects, like a Redis database or, or something like that. So it takes all the ideas of condos package management, environment management, add services as running processes to it, and then add data to it as well. So data shape, schema, stuff like that,

37:44 Wow, that sounds pretty interesting. So like, if I, if I want to do something where I'm gonna need a Jupyter notebook server running, I'm gonna need like Redis, like you said, and a few other things like that could sort of do that all in one shot,

37:55 and then in a portable way, so that you can move projects around from different from a, like a local dev environment to to something that is sort of running in production and against your production database. Even I conda one of the great things is it's completely OS agnostic, right? So being able to port your port your projects across OSS and share your share your projects

38:16 that way too. Cool. And like how does that help for reproducibility? Like, if I'm a scientist, and I've written a paper, could I make a capsule thing? Oh, and this is the thing that will let you go and do it. I

38:25 could see that happening. Yeah, yeah. Very, very, very cool. So the starting point, if you want to examine your data, or my data yourself, here's all the tools you need to do it. run the command and, and you're in. Okay,

38:38 excellent. And Phil, how about the conda forge conda Smithy thing on GitHub that you guys have? What's that?

38:45 Right. So as I say, we've got like 1500 recipes now that are sitting each in their own Git repository, and the Git repository that holds the recipes for the feedstock. And as you can imagine, there is duplication of things like the readme and the license and the continuous integration scripts. And we didn't want to have that kind of duplication where we had to manage that manually. So on the Smithy really is the the tool that is the cookie cutter, the templating engine that takes a recipe and turns it into this Git repository.

39:19 I see. So if I owned a package, and I wanted to get started, maybe I check out conda Smithy to like build it. What I'm going to start with,

39:25 you can render your recipe into a into a feedstock and see how that would look. And actually, that's precisely what we do. The turn recipes proposed to conda forge into the feedstocks and the tooling in conda. Smithy also includes the ability to make the new GitHub repository. It registers all the web hooks so you can kind of automatically register, travel Travis circle, an app fair, consistent, continuous integration. It just kind of packages everything neatly into one place so that we don't have to repeat ourselves 1500

39:59 Yeah, absolutely and make it easier for people to contribute. Alright guys, looks like we're running running out of time. I think we might have to leave it there for for conda. But let me ask you each two questions. I always ask at the end of the show, just there's three of you let you go kind of quick. There's First of all, there's over 90 I think 95,000 packages on pi pi, and many of them on conda. forge and within Anaconda. Can you give me maybe one of your favorite packages not initially popular but

40:29 you think is really cool that people should learn about Philco first package, which is extremely popular. And but it's so exciting. I'm thrilled that it exists. My favorite package is desk and desk distributed. It's just an amazing, amazing solution to a really big problem within the community. Yeah, so

40:47 this is like for parallelizing data pipeline type processing. Right?

40:51 Exactly. Yeah. And parallel array computation. Just really exciting.

40:57 task. That's pretty cool. Kale.

40:59 Yeah, the two packages that popped into my mind are Mukesh shimmies, bolt ons. I love that I use it all the time. And then Matt rocklands, pi tools, site tools, both fairly low level, but awesome, awesome libraries for writing general Python and using both all the time. Oh, excellent. Yeah,

41:18 I've checked out both ones not the other one off to have a look at that. They're both you know, both sound great. And Michael

41:23 pi test

41:24 kit. Yes,

41:26 pi test to make sure all these automated builds actually have some sort of meaning. Right?

41:30 Yeah. And also just all the plugins that go with PI tests, let it do incredible things in very expressive ways. I'm very, very thankful for pi test.

41:39 Excellent. All right, Michael, when you write some Python code with editor, you open up spacemacs face max.

41:44 All right, on kale for Python code in particular. It's

41:47 pi charm, pi charm. Okay, cool. And Phil,

41:50 I'm mostly a vim user, even on Windows. Okay.

41:52 Yeah, very nice. And I guess the legitimate real vim and the Unix tools are supposed to be there soon. Are they already there? I'm not sure. But the the Unix kata? No, no, in Windows 10 is supposed to have the boon to subsystem in it natively Pretty soon, maybe even help with some build? Who knows? But yeah, very cool.

42:13 Do we have it for conda? Has it been written? Does anybody know? I'm just curious. I don't know. I wrote I wrote nano A long time ago. But how does

42:21 it feel? Do you have them compiled for chicken? No, no. But then, you know, I'm completely out of the loop. So

42:28 sounds like somebody should take that one on. Awesome. All right, any final call to actions like how can people contribute to this overall project that you guys are all working on

42:36 them for conda Forge, apparently, that's step one. I have one we released I released conda 4.3. point zero today, it's a huge feature release. There's a it over almost 900 commits past the 4.2 feature release, or feature branch that it's released on, we have a conda Canary program. So just like Chrome has a canary program, I kind of ripped off the title. Plus there's some alliteration. So conda Canary, and you just add a conda conda. config, add channels conda Canary, add the conda Canary channel,

43:09 check it out.

43:10 Use it let us know what you think of it. The changelog is extensive, and we'll be releasing it general availability probably on the last day of 2016.

43:20 Perfect a nice New Year's gift. That's awesome. So yeah, people check that out anything else?

43:24 I'd say if anybody is missing a package on conda that is in pi pi, and it would make your life easier, then please submit a recipe to stage recipes. And the main thing I would emphasize about conda forge is the really incredible thing about it is the community. There is such an amazing pool of expertise there on how to build software and I've learned a lot and anybody stands to learn a lot. Just by getting involved there. Okay, excellent. Yeah,

43:51 absolutely easy to contribute and please do so. Any you guys gonna be Anaconda con in Austin in February.

43:58 Oh, yes. Kale will be around. Yep, absolutely. Nice.

44:01 Phil, are you traveling like halfway around the world?

44:03 Unfortunately, I won't be that for Anaconda con

44:06 this year. Alright, so people make it go say hi to Mike and kale and we can Tweet tweet to Phil. Alright guys, thanks so much for being on the show. It's been great to talk about conda. I think it's a really great project overall that you guys are working on. So thanks for that. Thank you, Michael.

44:19 Thank you.

44:21 This has been another episode of talk Python to me. Today's guests have been Phil Ellison kill friends and Michael sarahan. And this episode has been sponsored by continuum analytics in MongoDB University. Thank you both for supporting the show and whether you want to hear the keynote by Ryan Curran from Forrester Research, meet the guys behind Anaconda or just mingle with high end data scientists. You need to find your way to Austin, Texas for Anaconda con this February. Start at talk python.fm slash econ. Get the skills you need to build your Python apps on top of the most successful and in demand document database at MongoDB University. Take a free class By visiting talk python.fm slash Mongo or you are a colleague trying to learn Python. Have you tried books and videos that just left you bored by covering topics point by point? Well check out my online course Python jumpstart by building 10 apps at talk python.fm slash course, to experience a more engaging way to learn Python. And if you're looking for something a little more advanced, try my write pythonic code course at talk Python FM slash pythonic. Be sure to subscribe to the show open your favorite podcatcher and search for Python we should be right at the top. You can also find the iTunes feed at slash iTunes, Google Play feed at slash play in direct RSS feed at slash RSS on talk python.fm. Our theme music is developers developers, developers by Cory Smith Goes by some mix. Corey just recently started selling his tracks on iTunes. So I recommend you check it out at talk python.fm slash music. You can browse his tracks he has for sale on iTunes and listen to the full length version of the theme song. This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Let's mix.

46:05 Let's get out of here.

46:07 standing with my boys

46:10 having been sleeping. I've been using lots of rest. pass the mic back

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon