#338: Using cibuildwheel to manage the scikit-HEP packages Transcript
00:00 How do you build and maintain a complex suite of Python packages?
00:03 Of course, you want to put them on PyPI.
00:06 The best format there is as a wheel.
00:08 This means that when developers use your code, it comes straight down and requires no local tooling to install and use.
00:14 But if you have complex dependencies, such as C or Fortran, then you have a big challenge.
00:20 How do you automatically compile and test against Linux, macOS, that's Intel and Apple Silicon, Windows, 32 and 64-bit,
00:30 and so on?
00:30 That's the problem solved by CI Buildwheel.
00:33 On this episode, you'll meet Henry Schreiner.
00:36 He's developing tools for the next era of the Large Hadron Collider and is an admin of Scikit-HEP.
00:42 Of course, CI Buildwheel is central to that process.
00:46 This is Talk Python to Me, episode 338, recorded October 14th, 2021.
00:52 Welcome to Talk Python to Me, a weekly podcast on Python.
01:08 This is your host, Michael Kennedy.
01:09 Follow me on Twitter, where I'm @mkennedy, and keep up with the show and listen to past episodes at talkpython.fm.
01:15 And follow the show on Twitter via at Talk Python.
01:19 We've started streaming most of our episodes live on YouTube.
01:22 Subscribe to our YouTube channel over at talkpython.fm/youtube to get notified about upcoming shows and be part of that episode.
01:30 Hey there, I have some exciting news to share before we jump into the interview.
01:34 We have a new course over at Talk Python.
01:36 HTMLX plus Flask, modern Python web apps, hold the JavaScript.
01:40 HTMLX is one of the hottest properties in web development today, and for good reason.
01:45 You might even remember all the stuff we talked about with Carson Gross back on episode 321.
01:50 HTMLX, along with the libraries and techniques we introduced in our new course,
01:55 will have you writing the best Python web apps you've ever written.
01:58 Clean, fast, and interactive, all without that front-end overhead.
02:01 If you're a Python web developer that has wanted to build more dynamic, interactive apps,
02:06 but don't want to or can't write a significant portion of your app in rich front-end JavaScript
02:11 frameworks, you'll absolutely love HTMLX.
02:14 Check it out over at talkpython.fm/HTMLX, or just click the link in your podcast player's show notes.
02:20 Now let's get on to that interview.
02:22 Henry, welcome to Talk Python to Me.
02:25 Thank you.
02:26 Yeah, it's great to have you here.
02:27 I'm always fascinated with cutting-edge physics with maybe both ends of physics, right?
02:34 I'm really fascinated with astrophysics and the super large, and then also the very, very small.
02:39 And we're going to probably tend a little bit towards the smaller, high-energy things this time around,
02:45 but so much fun to talk about this stuff and how it intersects Python.
02:48 Some of the smallest things you can measure and some of the largest amounts of data you can get out.
02:52 Yeah, the data story is actually really, really crazy, and we're going to talk a bit about that.
02:58 So neat, so much stuff.
03:00 We used to think that atoms were as small as things could get, right?
03:04 I remember learning that in elementary school.
03:05 There are these things called atoms.
03:07 They combine to form compounds and stuff, and that's as small as it gets.
03:11 And yeah, not so much, right?
03:13 Yeah, that was sort of what atom was supposed to mean.
03:15 Exactly, the smallest bit, but nope.
03:19 But that name got used up, so there we are.
03:21 All right, well, before we get into all that stuff, though, let's start with your story.
03:25 How did you get into programming in Python?
03:26 Well, I started with a little bit of programming that my dad taught me.
03:31 He was a physicist.
03:33 And I remember it was C++ and sort of taught the way you would teach Java,
03:37 you know, all objects and classes.
03:39 Yeah.
03:39 Just a little bit.
03:41 And then when I started at college, then I wanted to take classes, and I took a couple classes again in C++.
03:48 But I just really loved objects and classes.
03:51 Unfortunately, the courses didn't actually cover that much, but the book did.
03:55 So I really got into that.
03:57 And then for Python, actually, right when I started college, I started using this program called Blender.
04:02 Oh, yeah.
04:03 Blender.
04:03 I've heard of Blender.
04:04 It's like a 3D animation tool, like Maya or something like that, right?
04:08 And it's very Python-friendly, right?
04:10 Yes.
04:11 It has a built-in Python interpreter.
04:13 So I knew it had this built-in language called Python, so that made me really want to learn Python.
04:16 And then when I went to an REU, a research experience for undergraduates
04:22 at Northwestern University in Chicago.
04:25 And when I was there, we had this cluster that we were working on.
04:29 This was in solid-state physics, material physics.
04:32 And we would launch these simulations on the cluster.
04:36 And so I started using Python, and I was able to write a program that would go out,
04:42 and it would create a bunch of threads, and it would watch all of the cluster,
04:46 all the nodes in the cluster.
04:48 And as soon as one became available, it would take it.
04:49 So I could just, my simulation would just take the entire cluster.
04:52 After a few hours, I would have everything.
04:54 So at the end of that, everybody hated me, and everybody wanted my scripts.
04:58 Exactly.
04:59 They're like, this is horrible.
05:01 I can't believe you did that to me, but I'll completely forgive you if you just give it to me and only to me,
05:06 because I need that power.
05:07 Yeah, that's fantastic.
05:09 How neat.
05:10 So I think that is one of the cool things about Python, right, is that it has this quick prototyping approachability.
05:18 They're like, I'm just going to take over a huge hardware, right?
05:22 Like a huge cluster of servers, but it itself doesn't have to be like intense programming.
05:26 It can be like this elegant little bit of code, right?
05:28 You can sort of do things that, normally I think the programming gets in the way more,
05:32 but Python tends to stay out.
05:34 It looks more like pseudocode.
05:35 So you can do more and learn more, and eventually you can go do it in C++ or something.
05:40 Yeah.
05:41 Yeah, absolutely.
05:42 Great way to start.
05:43 Or maybe not.
05:44 Sometimes you do need to go do it in some other language, and sometimes you don't.
05:48 I think the stuff at CERN and LHC has an interesting exchange between C++
05:54 and maybe some more Python and whatnot, so that'll be fun to talk about.
05:59 Yeah.
05:59 We've been C++ originally, but Python is really showing up in a lot more places,
06:04 and there's been a lot of movement in that direction.
06:07 And there have been some really interesting things that have come out.
06:09 A lot of interesting things have come out of the LHC, computing-wise as well as physics.
06:13 Awesome.
06:14 Yeah.
06:15 As a computing bit of infrastructure, there's a ton going on there.
06:18 And as physics, it's kind of the center of the particle physics world, right?
06:22 So it's got those two parallel things generating, all sorts of cool stuff.
06:27 I want to go back to just really quickly to, you know, you talked about your dad teaching a little programming.
06:31 If people are out there and they're the dad, they want to teach their kids a little bit of programming,
06:36 I want to give a shout out to CodeCombat.com.
06:38 Such a cool place.
06:39 My daughter just yesterday was like, hey, dad, I want to do a little Python.
06:42 Remember that game that taught me programming?
06:45 Like, yeah, yeah, sure.
06:46 So she's like, she logged in and started playing and basically solve a dungeon interactively by writing Python.
06:52 And it's such an approachable way, but it's not the like draggy, dropy, fake stuff.
06:55 You write real Python, which I think is cool to introduce kids that way.
06:59 So anyway, shout out to them.
07:01 I had them on the podcast before, but it's cool to see kids like taking to it in that way, right?
07:05 Whereas you say it like, you could write a terminal app.
07:07 They're like, I don't want to do that.
07:08 But solve a dungeon.
07:10 Yeah, they could do that.
07:11 Yeah.
07:11 I've actually played with a couple of those.
07:12 They're actually really fun just to play.
07:13 Yeah, they are.
07:14 Exactly.
07:15 I did like 40 dungeons along with my daughter.
07:17 It was very cool.
07:18 How about now?
07:19 What do you do now?
07:19 So I work in a lot of different areas and I jump around a lot.
07:24 So I do a mix of coding.
07:26 I do some work on websites because they just needed maintenance and somehow I got volunteered.
07:33 And some writing.
07:35 So less coding than I would like, but I definitely do get to do it, which is fun.
07:40 Yeah.
07:40 And this is at CERN or at your university or where is this?
07:44 So now I'm at Princeton University and I'm part of a local group of RSEs, research software engineers.
07:52 And I'm also part of Iris Hep, which we'll talk about a little bit.
07:57 But that's sort of a very spread out group.
08:00 Some of us are at CERN, a few are in some other places, a few in some Fermilab.
08:06 And energy physicists are just used to working remote.
08:09 The pandemic wasn't that big of a change for us.
08:11 We were already doing all our meetings remote.
08:12 We just eventually changed from video to Zoom.
08:15 But other than that, it was pretty much the same.
08:17 Exactly.
08:17 It was real similar for me as well.
08:19 That's interesting.
08:20 Fermilab, that's in Chicago, outside Chicago, right?
08:23 Yes.
08:23 Is that still going?
08:24 I got the sense that that was shutting down.
08:26 They're big in neutrino physics.
08:27 So they do a lot of neutrino things there.
08:30 And then they're also very active just in the particle physics space.
08:34 So you may be at Fermilab, but working on CERN data.
08:37 I see.
08:37 Okay.
08:38 Interesting.
08:38 Yeah.
08:39 I got to tour that place a little bit and it's a really neat place.
08:42 It is.
08:43 CERN's a neat place too.
08:44 I would love to tour CERN, but it wasn't 20 minutes down the street from where I happened to be.
08:50 So I didn't make it there.
08:51 Sadly, I hope to get back there someday.
08:53 All right.
08:54 Well, let's talk about sort of the scikit-hep side of things and how you got into
09:02 maintaining all of these packages.
09:05 So you found yourself in this place where you're working on tools that help other people build packages
09:10 for the physicists and data scientists and so on, right?
09:14 So where'd that all start?
09:16 So with maintenance itself, the first thing I started maintaining was a package called Plumbum back in 2015.
09:23 And at that point, I was starting to submit some PRs and the author came to me and said,
09:30 I would like to have somebody do the releases.
09:32 I need a release manager.
09:34 I don't have time.
09:35 And I said, sure, I'd be happy to do it.
09:36 And it was exciting for me because it was the first package or like real package I got to join.
09:42 And so I think on the page, it might even still have the original news item
09:47 when it says, welcome to me.
09:48 But-
09:49 Nice.
09:50 So that was the first thing I started maintaining.
09:52 And then I was working on a IG physics tool called Goofit when I became a postdoc.
10:00 And I worked on sort of really renovating that.
10:03 It started out as a code written by physicists.
10:06 And I worked on making it actually installable and packaged nicely and worked with a student to add Python bindings to it,
10:14 things like that.
10:14 And as part of that, I wrote a C++ package, CLI 11.
10:18 It was just a first package I actually wrote and then maintained.
10:22 And it's actually in C++.
10:24 And that was written for Goofit, but now it's a fairly, I think it's done pretty well on its own.
10:30 Nice.
10:31 What's that one do?
10:32 Microsoft Terminal uses it.
10:32 Yeah.
10:33 Microsoft Terminal uses it?
10:35 Mm-hmm.
10:35 Oh, nice.
10:36 Yeah, I'm a big fan of Microsoft Terminal.
10:37 I've for a while now kind of shied away from working on Windows because the terminal experience
10:44 has been really crummy.
10:45 You know, the cmd.exe command prompt style is just like, oh, why is it so painful?
10:50 And people who work in that all day, they might not see it as painful.
10:53 But if you get to work in something like a macOS terminal or even to not quite the same degree,
10:59 but still in like a Linux one, then all of a sudden, yeah, it kind of gets there.
11:02 But I'm kind of warming up to it again Windows Terminal.
11:06 Yeah, the Xterm is one of the reasons I use, I really moved to Mac because I loved Xterm.
11:12 And then Windows Terminal is amazing.
11:14 Now it's a great, great team working on it, including the fact that they used my parser.
11:18 But it's actually quite nice.
11:22 The only problem I have in Windows 10 is it's really hard to get the thing
11:24 to show up instead of cmd prompt.
11:28 Yeah.
11:28 But Windows 11, I think it's supposed to be the only one.
11:31 Yeah.
11:31 I definitely think it's included now, which is great.
11:34 So CLI 11, this is a C++ 11 command line parser, right?
11:39 Like click or arg parse or something like that, but for C++, right?
11:42 Yes.
11:43 It was designed off of the Pumbum command line parser.
11:46 Pumbum is sort of a toolkit and it has several different things.
11:48 I wish those things had been pulled out because I think on their own, they might have
11:51 maybe even been popular on their own.
11:55 it has a really nice parser, but it was sort of designed off of that and off click.
11:58 It has some similarities to the both of those.
12:01 Yeah.
12:02 I think probably that's a challenge.
12:04 I mean, we're going to get into CiteGithub with a whole bunch of these different packages,
12:08 but finding the right granularity of what is a self-contained unit that you want to share with people
12:14 or versus things like pulling out a command line parser rather than some other library, right?
12:19 This is a careful balance.
12:21 It's a bit challenging.
12:23 I think in Python, there's a really strong emphasis to having the individual
12:28 separate pieces and packages, especially in Python, partially because it has
12:32 a really good packaging system.
12:34 And being able to take things, have just pieces and be able to swap out one
12:39 that you don't like is really, really nice.
12:41 And that's one of the things we'll talk about the PyPA as well.
12:44 And that's one of the things that they focus on is small individual packages
12:48 that each do a job versus all-in-one poetry.
12:51 Yeah.
12:52 Well, you'll have to do some checking or some fact-checking, balancing, modernizing for me.
12:58 I did professional C++ development for a couple of years and I really enjoyed it
13:03 until there were better options.
13:05 And then I'm like, why am I still doing this?
13:07 I would go work on those.
13:09 But one of the things that struck me as a big difference to that world is basically
13:15 the number of libraries you use, the granularity of the libraries you use,
13:19 you know, the relative acceptance of things like pip and the ease of using another library,
13:25 right?
13:25 In C++, you've got the header and you've got the linked file and you've got the DLL
13:31 and there's like all sorts of stuff that can like get out of sync and go crazy
13:35 and like make weird crashes.
13:37 Your app just goes away and that's not great.
13:39 Is that still true?
13:40 I feel like that that difference is one of the things that allows for people
13:44 to make these smaller composable pieces in Python.
13:47 I think that has a lot to do with it.
13:49 What has happened in C++ is there's sort of a rise of a lot of header-only libraries
13:54 and these libraries are a lot easier to just drop into your project because all you do
14:00 is you put in the headers and there's no, you don't have to deal with a lot of the
14:05 original issues.
14:07 So a lot of these small standalone libraries are header-only and one of the next
14:11 things that I picked up as a maintainer was Pybind 11, which, and I've sort of
14:17 been in that space sort of between C++ and Python for quite a bit.
14:21 I kind of like being in that area, joining the two.
14:26 I get a sense from listening to the things that you've worked on previously
14:29 and things like this that you're interested in connecting and enabling, like piecing together,
14:34 like here's my script that's going to pull together the compute on this cluster
14:38 or here's this library that pulls together Python and C++ and so on.
14:41 Yes, making different things work together and combining things like C++ and Python
14:46 or combining different packages in Python and piecing together a solution.
14:49 I think that's one of Python's strengths versus something like MATLAB.
14:53 I spent quite a bit of time in MATLAB early on and got to move a lot of stuff
14:57 over to Python.
14:58 Right on, that's awesome.
14:59 It was really nice.
15:00 We didn't have to have a license and things like that.
15:02 I know, it's so expensive and then you get the, what are they called, toolkits,
15:07 the add-on toolkits and they're like, each toolkit is the price of another $1,000 a year
15:12 or $2,000 a year.
15:13 It's ridiculous.
15:14 So I know of CFFI, which is a way for Python and C to get clicked together
15:21 in a simple way.
15:23 How's Pybind 11 fit into that?
15:27 This is seamless interoperability between C++11 and Python.
15:30 How are they different?
15:32 So CFFI, I teach like a little short course where I can go through some of the different
15:38 binding tools and it usually ends with me saying Pybind 11 is my favorite.
15:41 Yeah, cool.
15:43 Give us an overview of what the options are and stuff.
15:45 CFFI is closer to C types.
15:47 It's more of, it's focused on C versus C++ and it's actually the one I've used
15:53 the least.
15:54 I was just helping, we're just talking with the CFFI developer but I've used it
15:59 the least of those but I think it basically parses your C headers and then sort of
16:06 automates a lot of what you would have to manually do with C types or you have to
16:09 specify what symbol you want to call and what the arguments are and what the return
16:14 type is and if one of those things is wrong you get a seg fault and that sort of thing.
16:17 Whereas Pybind 11, this is about building modules, extension modules.
16:21 So, and it's, and it's, the interesting thing about this is that it's written
16:26 in pure C++.
16:26 The other tools out there, so Cython can do this, it's not what it was designed for
16:31 but it immediately became popular for doing this because Cython turned code,
16:37 Python, Python-like code is a new language into, it transpiled it into C
16:42 or C++ that had a toggle you could change, has a toggle you can change and then
16:47 when you're there you can now call C or C++ but it's extremely verbose and you repeat yourself
16:53 and you have to learn another language.
16:54 This weird combined Python thing and just thinking in Cython is difficult
16:58 because you have to think about well am I in Python or am I in Cython that can,
17:03 that's going to be bound to Python or am I in Cython that's just going straight to C,
17:07 C++ or am I just in C++ or C but I've actually used it.
17:11 It's a lot of layers there, yeah.
17:12 But Pybind 11 is just C++ and it's just, it's basically, it's like the, C API
17:19 for Python but a C++ API.
17:21 It's quite, it's quite natural and you don't have to learn a new language.
17:25 It uses some fairly advanced C++ but that's it.
17:28 You're learning something useful anyway.
17:29 Right.
17:29 So do you do some sort of like template type thing and then say I'm going to expose
17:34 this class to Python or something like that and then it figures out, does it write
17:38 the Python code or what is it?
17:40 It's writing the, build like .so files or what do you do here?
17:45 It, it compiles into the C API calls and then that would compile into a .so
17:50 and there's no separate step like Cython or Swig or these, or these other tools
17:55 because it's just C++.
17:56 You compile it like you do any other C++ but it's actually internally using
18:01 the CPython API or PyPy's wrapper for it and the language looks a lot like Python
18:06 but the names are similar.
18:07 You just do a def to define a function and you give it the name and then you just
18:11 pass it the pointer to the, the underlying thing.
18:14 It can figure out things like types and stuff like that for you.
18:16 Give it a doc string if you want.
18:18 Give the arguments names.
18:19 You can make it as Pythonic as you want.
18:20 It's verbose but it's not overly verbose.
18:23 Yeah, that's really neat.
18:25 Nice.
18:25 And for people who haven't used those kind of outputs, basically, it's just import
18:30 module name whether it's a .py file or it's a .so file.
18:36 PyTorch if you've used .py if you've used any of those things you have, you've been importing
18:43 some PyBind11 code.
18:44 So let's talk a little bit about Scikit-Hep.
18:48 This is one of the projects that it has a lot of these packages inside of it
18:54 and your library CBuild Wheel is one of the things that is used to maintain
19:03 and build a lot of those packages because I'm sure they have a lot of interesting
19:06 and oddball dependencies, right?
19:08 I mean, C++ is kind of standard, but there's probably others as well, right?
19:12 It is.
19:14 So one thing that is kind of somewhat unique to HEP is that we are very heavily invested
19:19 in C++.
19:19 So it's usually either you're going to see Python or you're going to see some sort
19:23 of C++ package of some sort.
19:25 I mean, it could be varies in size there, but it's mostly C++ or Python.
19:30 We really haven't used other languages much for the past early 90s or so.
19:35 Is that inertia or is that by choice?
19:39 You know, why is that?
19:40 I think it's partially the community is a fairly cohesive community.
19:46 We're really used to sort of working together.
19:48 The experiments themselves are often, you know, might be a thousand or several thousand
19:54 physicists working on a single experiment.
19:55 And we have been fairly good about sort of meeting together and sort of deciding
20:01 the direction that we want to go in and it's sort of sticking to that.
20:04 So for C++, it was heavily root, which is a giant C++ framework.
20:11 And it's got everything in it.
20:13 And that was C++ and that's what everybody used.
20:15 So root is the library.
20:17 If I was going to write code that would run and interact with like the grid computing
20:23 or the data access and all that kind of stuff at LHC, I would use this root library
20:28 if I was doing that in C++, right?
20:30 Yes.
20:30 You might be using interpreted C++, which is something we invented.
20:33 Oh, okay.
20:35 This is interesting.
20:37 Is this something people can use?
20:38 Oh, yes.
20:39 We actually, so Cint was the original interpreter and then it got replaced by
20:44 Cling, which is built on the LLVM.
20:47 And I think recently it was merged to mainline LLVM as Clang Repl, I think it's
20:54 called, but it's sort of a lightweight version.
20:56 Yeah, it's a C++ interpreter.
20:59 You can actually get Zeus Cling, which I think Quantstack has, but they package
21:06 it as well, I think, Zeus Cling.
21:08 Okay, yeah, very interesting.
21:09 It's not, C++ really wasn't designed for a notebook though.
21:13 It does work, but you can't rerun a cell often because you can't redefine things.
21:18 Python is just really natural in a notebook and C++ is not.
21:21 Yeah, especially if you change the type, you compile it as an int and then you're
21:25 like, ah, that should be a string.
21:26 Yeah, that's not going to be a string.
21:27 It's compiled.
21:28 Yeah, interesting.
21:29 So it seems to me like the community at CERN has decided, look, we need some
21:34 low-level stuff and there's some crazy low-level things that happen over there.
21:38 People can check out a video, maybe I'll mention a little bit later.
21:41 But for that use, they've sort of gravitated towards C and then for the other aspects,
21:47 it sounds like Python is what everyone agreed to.
21:50 It's like, hey, we want to visualize this, we want to do some notebook stuff, we want
21:53 to piece things together, something like that, right?
21:56 It's certainly moving that way.
21:58 They definitely have sort of agreed that Python should be a first-class language and
22:05 join C++.
22:05 That was decided a few years ago.
22:07 And I think that's been a great step in the right direction because what was
22:11 happening, people were coming in with Python knowledge.
22:14 They wanted to use Pandas.
22:15 I came in that way as well.
22:17 Pandas and Numba and all these tools were really, really nice.
22:21 And we were basically just having to write them all ourselves in C++.
22:25 It has a data frame, but why not just use Python, which is what people know
22:31 anyway?
22:32 Panda exists.
22:33 There's a ton of people already doing the work maintaining it for us.
22:36 Root literally has a string class.
22:39 Literally, they do everything.
22:42 So the idea, and this is sort of the idea behind Scikit-HEP was to build
22:47 this collection of packages that would just fill in the missing pieces, the things that
22:53 energy physicists were used to and needed.
22:55 And some of them are general and were just gaps in the data science ecosystem,
23:00 and some things are very specific, high energy physics.
23:03 Scikit-HEP actually sort of originated as a single package.
23:08 It sort of looked like root red at first, and it was invented by someone called
23:15 Eduardo Rodriguez, who was actually in my office at CERN, and we're office mates.
23:19 But he did something I think really brilliant when he did this, and that is
23:23 he created an organization called Scikit-HEP around it, and then he went out and
23:27 spoke with people and got some of the other Python packages that existed
23:30 at the time to join Scikit-HEP, moved them over and started building a collection of
23:35 some of the most popular Python packages at the time.
23:38 And I thought that was great, and I really wanted Scikit-HEP to become a
23:43 collection of tools, separate tools, and for the Scikit-HEP package to just
23:47 be sort of a meta package that just grabbed all the rest.
23:50 And that's actually kind of where it is now.
23:51 Right.
23:52 I can pip install Scikit-HEP.
23:53 Is that right?
23:54 You can, and mostly, other than a few little things that are still in there
23:57 that never got pulled out, that will mostly just install our most popular,
24:01 maybe 15 or so packages, 2015 of our most popular packages.
24:06 Yeah, so it probably doesn't really do anything other than, say, it depends
24:10 on those packages or something like that, right?
24:12 And then by virtue of installing it, it'll grab all the pieces.
24:15 Yeah, yeah, that's a really cool idea and I like it.
24:18 So maybe one of the things I thought would be fun is to go through some of
24:22 the packages there to give people a sense of what's in here.
24:25 Some of these are pretty particular and I don't think would find broad use outside of
24:30 CERN.
24:31 For example, Conda Forge Root.
24:33 It sounds like that's about building root so I can install it as a dependency
24:37 or something like that, right?
24:39 building root is horrible and you actually now can get it as part of a Conda package
24:45 which is just way better than anything that was available for attaching it to a
24:49 specific version of Python because it has to compile against a very specific version of
24:54 Python but that's what it does.
24:56 So unless you want something in root then that's very HEP specific.
25:00 Yeah, absolutely.
25:01 Some more general ones, probably our first, briefly Mitch, our very first package that I
25:07 think was really popular among energy physicists that we actually produced was
25:13 Uproot which was just a pure Python package so you didn't have to install it that
25:19 read root files.
25:19 Again, very specific for somebody who was in high energy physics but you
25:25 could actually read a root file and get your data without installing root and
25:29 that was a game changer.
25:31 So now you can actually install root slightly easier but normally it's a
25:35 multi-hour compile and it's got.
25:38 gotten better but it's still a bit of a beast to compile especially for Python.
25:40 Yeah, that does sound like a beast.
25:41 Oh my gosh.
25:42 And now you can just read in your files.
25:44 Basically, Jim Povarsky basically just taught Python to understand the decompile the root
25:50 file structure and actually can write right now too but originally reading.
25:54 But that actually was really...
25:56 So this is like if I want to do, if I want to create a notebook and maybe
25:59 visualize some of the data but I don't really need access to anything else, I shouldn't
26:03 depend on this beast of almost its own operating system type of thing.
26:08 Yeah, we were very close to being able to use all the data science tools in
26:12 Python, pandas, things like that.
26:13 For most data worked fine.
26:15 You just had to get the data.
26:17 And I mean, I've done this too where I had one special install of Python and
26:23 root together that I'd worked several hours on and it sat somewhere and I would convert
26:27 data with it.
26:27 I'd move it to HDF5 and then I would do all the rest of the analysis in Python that
26:32 didn't have it because then I could do virtual environments and all that read that HDF5
26:36 format, right?
26:37 Mm-hmm.
26:37 Yeah.
26:38 Right, okay.
26:39 The first package we had that was really popular on its own was Awkward Array.
26:44 Yeah.
26:44 Awkward Arrays.
26:45 I definitely heard about this one, yeah.
26:47 Yeah, that was originally part of Upproot, sort of grew out of Upproot.
26:51 When you're reading root files, you end up with these jagged arrays.
26:55 So that's an array that is not rectangular.
26:58 So at least one dimension is jagged.
27:01 it depends on the data and this shows up in all sorts of places and not just particle
27:07 collisions or obviously shows up lots of places in particle collisions like how many hits
27:10 got triggered in the detector.
27:12 That's a variable length list.
27:13 How many tracks are in an event?
27:14 You know, that's a variable length list and can be a variable length list of
27:18 structured data.
27:18 And to store that compactly the same way you'd use numpy was one thing, but you can
27:25 arrow and there's some other, there's some other things that do this, but Awkward Array
27:28 also gives you numpy like indexing and data manipulation.
27:33 And that was the sort of the breakthrough thing here.
27:36 It's like numpy.
27:38 The original one was built on top of numpy.
27:40 The new one actually has some pybind11 compiled bits and pieces, but it makes working
27:47 with that really well.
27:47 In fact, Jim Povarsky has now got a grant to expand this to, I don't remember the
27:53 number of different disciplines that he's working with, but lots of different areas,
27:57 genomics and things like that have all use cases and he's adding things like
28:02 complex numbers and things that weren't originally needed by energy physicists, but
28:05 make it widely.
28:07 Almost an evangelism, like dev evangelism type of role, right?
28:11 Go talk to the other groups and say, hey, we think you should be using this.
28:16 What is it missing for you to really love it?
28:18 Something like that, right?
28:19 How interesting.
28:20 Yeah.
28:20 So, yeah.
28:22 Yeah.
28:22 So looking at the Awkward Array page here says for a similar problem, 10 million times
28:28 larger than this example given above, which one above is not totally simple.
28:32 So that's pretty crazy.
28:33 It says Awkward Array, the one liner takes 4.6 seconds to run and uses 2 gigs of
28:40 memory.
28:40 The equivalent Python list in dictionaries takes over two minutes and uses
28:45 10 times as much memory, 22 gigs.
28:47 So, yeah, that's a pretty appealing value proposition there.
28:50 Yeah.
28:51 And it supports Numba.
28:53 Jim works very closely with the Numba team and really is one of the experts on the
28:58 Numba internals.
28:59 So, yeah, it has full Numba support now and he's working on adding Dask.
29:04 He's working with Anaconda on this grant and then working with adding GPU support.
29:09 Very cool.
29:10 Maybe not everyone out there knows what Numba is.
29:12 Maybe give us a quick elevator pitch on Numba.
29:15 Yeah.
29:15 I hear it makes Python code fast, right?
29:18 Yeah, it's a just-in-time compiler and it takes Python.
29:23 It takes Python.
29:24 It actually takes the bytecode and then it basically takes that back to something or it
29:31 parses the bytes code and turns it into LLVM.
29:34 So it works a lot like Julia except instead of a new language, it's actually reading Python
29:39 bytecode, which is challenging because the Python bytecode is not something that stays
29:44 static or is supposed to be a public detail.
29:47 Yeah, there's no public promises about consistency of bytecode across versions because
29:54 they play with that all the time to try to speed up things and they add bytecodes and
29:58 they try to do little optimizations.
30:00 Yeah, so every Python release breaks Numba.
30:02 So they have to, they just know the next Python release will not support Numba and it
30:06 usually takes a month or two.
30:07 But it's very impressive though.
30:11 It's the speedups, you do get full sort of C type speedups for something that looks
30:16 just like Python.
30:16 It compiles really fast for a small problem and it's as fast as anything
30:22 else you can do.
30:23 I've tried lots of these various programming problems and you just about can't
30:29 beat Numba.
30:29 It actually knows what your architecture is since it's just in time compiling.
30:33 So you have to do which is an advantage over say like C, right?
30:37 It can look exactly at what your platform is and your machine architecture
30:41 and say we're going to target, you know, I see your CPU supports this special vectorized thing
30:46 or whatever and it's going to build that in, right?
30:47 and then what sort of Jim does with Awkward and we've done with some other things with Vector
30:51 does this too.
30:52 You can control what Python turns into what LLVM constructs any Python turns into because
31:00 you can control that compile phase.
31:02 That's incredibly powerful because you can say and it doesn't have to be the same thing but
31:06 obviously you want it to behave the same way.
31:08 You can say if you see this structure, this is what it turns into.
31:12 in LLVM machine code which then gets compiled or machine language which then gets compiled
31:19 into your native machine language.
31:21 Interesting.
31:21 Assembling.
31:22 So if you have like a certain data structure that you know can be well represented or gets
31:27 packed up in a certain way to be super efficient you can control that?
31:30 Yeah, you can say that well this like this operation on this data structure, this is what
31:35 this is what it should do and then that turns into LLVM and maybe it can
31:38 get vectorized or things like that for you.
31:41 Yeah, yeah.
31:42 That's super neat.
31:43 Another package in the list that I got to talk about because just the name and the graphic is
31:47 fantastic is a gassed.
31:49 What is a gassed?
31:51 It's got like this the scream.
31:53 I forgot who was the artist of that but the scream sort of look as part of the logo is good.
31:59 About half of the logos come from Jim and I did about half and he did about half and then
32:04 use other around or from the individual package authors.
32:07 A gassed was so this is sort of part of the histogramming area which is where sort of the
32:12 area I work in, psychic hub.
32:13 But Jim actually wrote a gassed and the idea was that it would convert between
32:17 histogram representations.
32:19 I think it came up because Jim got tired of writing histogram libraries.
32:22 I think he's written at least five.
32:24 Yeah, one of the things I got the sense of by looking through all the psychic hub stuff,
32:29 there's a lot of histogram stuff happening over there.
32:32 histograms are sort of the area that I was in and it ended up coming in in
32:36 several pieces.
32:37 But I think one of the important things was actually, and I think a gassed may not really
32:41 matter, it may get archived at some point because instead of translating between
32:47 different representations of histograms in memory, what you can do is define
32:52 a static typing protocol and it can be checked by mypy that describes what a object needs
33:01 to be called a histogram.
33:01 And so I've defined that as a package called UHI, Universal Histogram Interface.
33:06 And anything that implements UHI, it can be fully checked by mypy, will then be able
33:11 to take any object from any library that implements UHI.
33:17 And so all the libraries we have that produce histograms, so uproot, when it reads a root
33:22 histogram or hist and boost histogram, when they produce histograms, they
33:26 don't need to depend on each other.
33:28 They don't even depend on UHI, that's just a static dependency for mypy time.
33:32 And then they can be plotted in MPLHEP or they can be printed to the terminal with
33:39 histoprint and there's no dependencies there.
33:42 One doesn't need the other.
33:43 And that's sort of making a gassed somewhat unneeded because now it really doesn't matter.
33:48 You don't have to convert between two because they both just work.
33:51 They work on the same underlying structure basically, right?
33:54 They work through the same interface.
33:57 Right.
33:58 Yeah.
33:58 So a gassed is a way to work with different histogramming libraries that kind of is the
34:04 intermediary of that.
34:06 It's like an abstraction layer on that.
34:08 Okay.
34:08 What are some other ones?
34:10 Yeah.
34:11 What are some other ones we should kind of give a shout out to?
34:14 we talked about GUFIT, which is there.
34:16 It's an affiliated package.
34:17 It's not part of scikit-hep, but it has.
34:19 So we developed this idea of an affiliated package for sure things that didn't need to be moved
34:24 in, but had at least one scikit-hep developer working with them.
34:30 At least that's my definition.
34:31 I was never able to actually get the rest to agree to exactly that definition, but that's my
34:36 working definition.
34:37 And so that's why pybind11 gets listed there.
34:39 It's an affiliated package because we share a developer, me, with the pybind11 library.
34:45 And we sort of have a say in how that is developed.
34:50 And most importantly, if we have somebody come into scikit-hep, we want them to use pybind11
34:54 over the other tools because that one we have a lot of experience with.
34:58 Very cool.
34:59 Another one I thought was interesting is hep units.
35:02 So this idea of representing units like the standard units, they're not enough for
35:08 us.
35:08 We have our own kind of things like molarity and stuff, but also luminosity
35:14 and other stuff, right?
35:16 Yeah.
35:16 Different experience service can differ a bit.
35:20 So there's a sort of a standard that got built up for units.
35:23 And so this just sort of puts that together and has, and the unit that we've sort of decided on
35:30 this should be the standard unit that's one and the rest are different scalers.
35:33 It's a very tiny little library.
35:35 It was the first one to be fully statically typed because it was tiny.
35:38 That's easy to do.
35:39 It was like, because mypy infers constants, there was like two functions or something
35:43 and then it was done.
35:44 Yeah.
35:45 Probably a lot of floats.
35:46 Mm-hmm.
35:47 So, but, and that's, that's sort of what it is.
35:50 So you can use that if, and the idea is that the rest of the libraries will, will adhere to
35:55 that system of units.
35:57 So then if you use this and then use that, the values it gives you, then you can have a nice
36:02 human, human readable units and be sure of your units.
36:05 Yeah.
36:05 That's really neat.
36:06 Have you heard of pint?
36:08 Are you familiar with this one?
36:09 Yes, I love pint.
36:10 Oh gosh, I think pint is interesting as well.
36:13 It takes the, the types through and I use pint some, but it actually gives you a quantity out
36:19 or a numpy quantity.
36:21 Whereas the happiness just stays out of the way and it's a way to be more
36:26 clear in your code, but it's not enforced.
36:27 Pint is enforced, which I like enforcing, but it's also can slow down.
36:31 You can't, these are not actual real numbers anymore.
36:34 So you pay it.
36:35 Yeah.
36:35 So it's going to add a ton of overhead, right?
36:36 But pint's interesting because you can do things like three times meter plus four times centimeter
36:41 and you end up with 3.04 meters.
36:44 Yeah.
36:45 Those are actually real quantities.
36:46 They're actually a different object, which is the good thing about it, but it's
36:50 also the reason that then it's not going to talk to say a C library that
36:52 expects a regular number or something as well.
36:55 Sure.
36:55 Okay.
36:56 Maybe one or two more and then we'll probably be out of time for these.
37:00 What else should people maybe pay attention to that they could generally find
37:03 useful over here?
37:04 You mentioned vector.
37:05 It's a little bit newer, but it's certainly for general physics.
37:08 I think it's useful because it's a library for 2D, 3D and relativistic vectors.
37:15 And there aren't really, it's a very common sort of learning example you see,
37:20 but there aren't really very many libraries that do this, that actually have,
37:23 if you want to take the magnitude of a vector in 3D space, there just isn't a
37:28 nice library for that.
37:29 So we wrote vector to do that.
37:31 And vectors is supported by awkward.
37:34 It has an awkward backend.
37:35 It has a number backend, a numpy backend, and then plain object backend.
37:39 Eventually we might work on more.
37:41 And it even has a number awkward.
37:42 So you can, you can use a, a vector inside an awkward array inside a number jet
37:48 compiled loop and still take magnitudes and do stuff like that.
37:51 That's really cool.
37:52 That integration there.
37:53 Yeah.
37:54 vectors because we have a lot of those in physics.
37:56 Sure.
37:56 And you can, you can do things like ask if one vector is close to another vector and
38:01 things like that, even in different, it looks like a one in polar coordinates and
38:05 one in, you know, a Cartesian or something like that.
38:08 It has different unit systems and it can actually, it actually stores the vector in
38:12 that.
38:13 So you don't waste memory or something.
38:15 If that's, that's the representation you have, that was a feature from a route
38:18 that we wanted to make sure we, we got.
38:20 And it also has sort of the idea of, of momentums too and stuff for the, for the
38:23 relativistic stuff.
38:24 We end up with a lot of that.
38:26 And then maybe just mention the, since we mentioned the histogramming stuff and that's
38:29 the area, that's the ones that I really work on.
38:31 The ones I specifically work on that are general purpose.
38:34 Boost histogram is a wrapper for the C++ boost histogram library.
38:38 Boost is the sort of the big C++ library, just one step below the standard library.
38:45 And right at the time I was starting at, at Princeton, the, I met the author of boost
38:50 histogramming who's from physics and he was in the process, I believe, of getting
38:55 this accepted into boost.
38:57 And it got accepted after that.
38:58 But one of the things that he decided to do is pull out his initial Python bindings that
39:03 were written in boost Python, which is actually very similar to pybind11, but requires boost
39:10 instead of not requiring anything.
39:11 But the design is intentionally very similar.
39:13 And so I proposed I would, I would work on boost histogram and write these, this, the
39:20 Python bindings for it inside scikit-hep.
39:22 And that would be sort of the main project I started on when I started at Princeton.
39:26 And that's, you know, that's what I did.
39:29 Boost histogram is a extremely powerful histogramming library.
39:32 So it's a histogram as an object rather than like a NumPy, you can, there's a
39:36 histogram function and you give it an array and then it spits a couple of arrays back out at
39:40 you.
39:40 You are now, you now have to manage these.
39:44 They don't have any special meaning.
39:45 Whereas boost histogram, histograms really are much more natural as an object, just like a
39:49 data frame is more natural as an object where you tie that information together.
39:52 A histogram's really natural that way, where you still have the information about what the data
39:56 actually was on the axes.
39:58 If you have labels, you want to keep those attached to those, to that, to that data.
40:03 And you may need to fill again, which is one of the main things that, energy physicists really wanted
40:07 because we tend to fill histograms and then keep filling them or rebinning them
40:11 or doing operations on them.
40:12 And you can do all those very naturally.
40:14 And boost histograms, the actual, the C++ wrapper in PyBind 11 and a lot of, and,
40:20 I actually got involved in CI BuildWell because of boost histogram, because I, one of the
40:24 things I wanted to just make sure it worked everywhere.
40:26 And it obviously requires C++.
40:27 It requires compilation.
40:30 and then hist is a nice wrapper on top of that that just makes it a lot more friendly to,
40:33 to use because the original boost histogram author wants to keep this, Hans Dubinsky wants
40:38 to keep this quite, pure and clean.
40:40 So hist is a more, the more natural.
40:43 And even if you're not in hep, I think that's still the more natural one to use.
40:45 Yeah.
40:46 Gold.plot and plot.
40:48 Right, right.
40:48 There's a lot of people who do, who use histograms across all sorts of disciplines.
40:52 So that would definitely be one of those that is generally useful.
40:56 All right.
40:56 So I think that brings us to CI build wheel.
40:59 let's, let's talk a bit about that.
41:02 And I mean, maybe the place to start here is, you want to wheels, right?
41:06 The, the first sentence describing it as Python wheels are great building them across
41:10 Mac, Linux, windows, and other multiple versions of Python.
41:13 Not so much.
41:14 So no description.
41:15 Yeah, exactly.
41:17 Well, wheels are good.
41:19 There's times when there are no wheels and things install slower.
41:23 They might not install at all.
41:24 It's generally a bad thing if you don't have a wheel, but, they're, they're not easy
41:29 to make.
41:29 Right.
41:29 So tell us what is a wheel and then let's talk about why maybe building them across all these
41:33 platforms and this, cross product along with like versions of Python and whatnot.
41:38 It's a mess.
41:39 When you distribute Python, you have several options.
41:41 The most common one and most packages have at least an S dist, which is just basically a
41:46 tar ball of the, of the source.
41:48 Right.
41:49 When you pip install it, it basically, you're missing some things or adding some
41:52 things, but right.
41:53 Otherwise it's mostly unzips.
41:54 Yeah.
41:54 It unzips your source and puts it somewhere.
41:56 Python will find it.
41:57 And then that's that.
41:58 Yeah.
41:58 So it runs your build system.
42:00 So set up tools traditionally that's become a lot more powerful recently, but, it
42:04 has to run the build system to figure out what, what do you do with it?
42:07 This is just a bunch of files.
42:08 and then it puts it together in a particular structure in your, in your, on your computer.
42:14 And so a wheel was a package that was already, everything was already in place.
42:19 So it's already in a particular structure.
42:21 It knows, knows the structure and all Python has to do for a pure Python wheel, one that does not
42:26 have any, binary pieces in it.
42:30 It just grabs the contents inside and dumps them following a specific, set of rules into places into your,
42:38 site packages.
42:39 Right.
42:39 So then you now have something installed.
42:40 There's no setup.py in your wheel.
42:42 There's no pyproject.tomol.
42:45 There's those sorts of things are not in the wheel.
42:47 The wheel's already there.
42:48 It can't run arbitrary code.
42:51 Yeah, exactly.
42:51 That was one of the points I was going to make is one of the things that can be scary about
42:55 installing packages is just by virtue of installing them, you're running arbitrary code because
43:01 often that is execute, you know, Python space, set up py space, you know, install or something
43:08 like that.
43:08 And like, whatever that thing does, that's what happens when you pip install.
43:11 Right.
43:12 But not with wheels, as you said, it comes down in a binary blob and just like, boom, here it is.
43:16 Obviously the thinking is we have this package delivered to a million computers.
43:21 Why do we need to have every million computer run all the steps?
43:24 Why don't we just run it once and then go here?
43:26 And then also that saves you a ton of time.
43:28 Right.
43:29 like I just installed micro whiskey and it took, I don't know, 30 seconds, 45 seconds to
43:34 install because it didn't have a wheel.
43:36 So it sat there and it just grinded away compiling it, you know?
43:39 Yeah.
43:40 So there's two possibilities.
43:41 A pure Python package, a wheel is still superior because of the not running arbitrary code.
43:47 pip will actually go ahead and compile all your PYC files.
43:51 Your, that goes ahead and makes the bytecode for all those.
43:55 If it's a wheel, if it's an S, if it's a tarball, it doesn't do that.
43:58 If it doesn't pass through the wheel stage anyway.
44:00 And then when, every time you open the file, then it's going to have the first time it's going to have
44:05 to make that, that bytecode.
44:06 So it'll be a little slower the first time you open it.
44:08 There's, there's a variety of, of reasons.
44:10 I think it's pythonwheels.com, something like that.
44:13 That describes why you should use wheels.
44:16 That's maybe that's not it, but I think it is.
44:18 Yes.
44:19 Python wheels.
44:19 So they have like a list of advantages there, but.
44:23 Yeah.
44:23 I also have a little like checklist.
44:25 It says, how are we doing for the top 360 packages?
44:30 And apparently 342 of them have wheels.
44:32 And it shows you for your popular packages, which ones like click does, but future doesn't, for example, and so
44:38 on.
44:39 So.
44:39 Future's been there for a long time.
44:41 Yeah.
44:42 But, but yeah, so wheels are really good.
44:45 And they actually replaced an older mechanism that was trying to do something somewhat similar called
44:50 eggs, but I avoid talking about those.
44:53 I don't really understand.
44:54 Let it live in the past.
44:55 Let it live in the past.
44:56 Wheels also are a great way if you have compile and compile that happens.
45:01 So if you compile some code as part of your, as part of your build, then
45:07 that of course is much slower.
45:08 If you have the, if you just have the example.
45:11 Yeah.
45:11 It's like it was doing GCC or something forever.
45:13 And if you don't have a compiler, it won't even work.
45:15 Right.
45:15 Exactly.
45:15 You have to have some setup, at least a little setup.
45:18 You have to have a compiler setup at the very moment.
45:20 Right.
45:20 How many windows users have seen cannot find vcvars.bat?
45:24 Right.
45:25 Like what is this?
45:26 I don't want this.
45:26 In windows you have to be in a, in the environment or you have to have the, the right script sourced.
45:30 Yes.
45:30 So wheels had also can contain binary components like .so's and things.
45:37 And they have a tag as part of their name.
45:39 They have a very special naming scheme for wheels and the tag is stored in
45:43 the wheel too.
45:44 And they can tell you what Python version they're good for, what platform they can,
45:51 are supported on.
45:52 They have a build number and then they have a, the Python's actually in two
45:57 pieces.
45:57 There's the, the AVI and the interface.
46:00 Yeah.
46:01 You can see there's some huge long name that with a bunch of underscores
46:04 separating it.
46:05 And basically, when you try to install it, sorry, go ahead.
46:09 I was saying it's also one of the reasons that names are normalized.
46:11 There's no difference between a dash and underscore.
46:13 It's because that special wheel name has dashes in it.
46:16 So the package name at that point in the, in the file name has to be underscores.
46:20 Yeah.
46:21 And so basically when you pip install, it says it, it builds up that, that
46:24 name and says, do you have this as a binary?
46:27 Give it to me, right?
46:27 Something like this.
46:28 Yeah.
46:29 It knows how to pick out the, it looks for the right one.
46:31 If it finds a binary, it'll just download it depending slightly on the system and how new your pip is.
46:36 Right.
46:37 And this is one of the main innovations ideas or philosophies behind Conda and, Anaconda,
46:44 right?
46:44 It's like, let's just take that and make sure that we build all of these
46:47 things in a really clear way.
46:48 And then sort of package up the, the testing and come compilation and distribute, distributing all
46:54 that together.
46:54 Right.
46:55 Yes.
46:55 This is very similar to this.
46:57 This came, I think, I'm pretty sure it came after Conda.
46:59 I think where they were still in eggs when Conda was invented and then sort of building up wheels was
47:05 challenging.
47:05 Building a wheel was, was challenging.
47:08 That's, that's, yeah, build wheel has really changed that.
47:10 if you want a pure Python, it's really easy.
47:12 You use today, you should be using the build tool, which I'm also, I'm a
47:15 maintainer of that as well.
47:17 but build just builds an S dist for you or it builds a wheel.
47:22 And so you would say something like Python, set up PY, B dist or something like that.
47:28 And then boom, I shouldn't be doing that anymore.
47:29 Please don't.
47:30 But that is how you do it.
47:31 Yeah.
47:31 How would I do it?
47:32 Tell me the right way.
47:33 The best.
47:34 well you could do Python, or pip install build and then Python dash M build, and that will build both an
47:42 S dist and a wheel and it'll build the wheel from the S dist.
47:45 if you use pip X, which I would recommend, then you can just say pip X run build and you don't have to do
47:50 anything that'll, that'll download build into a virtual environment for you.
47:54 It'll do it.
47:54 And then it eventually it'll throw away the virtual environment, after a
47:57 week.
47:57 Interesting.
47:57 Okay.
47:58 So we could just use the build.
48:00 We should be using the build.
48:01 You should be using the build tool.
48:03 It's for an S dist.
48:04 There's a big, benefit to this.
48:06 And that is it will, it will, use your piproject.toml.
48:10 And if you say you require numpy, then it will go, like you're using the
48:15 numpy headers, the C headers, then it will go and it will, when it's, when
48:19 it's building S dist, it will make the pep, 517 virtual environment.
48:25 It'll install numpy, anything that's in your, your, your, requires in your piproject.toml.
48:30 And then it will run the setup.py, inside that environment.
48:35 So you can now import numpy directly in there.
48:37 and it'll work even when you're building an S dist.
48:40 If you do Python S dist or Python, setup.py stuff, you can't do that because
48:45 you're literally running Python, giving it setup.py import numpy.
48:49 now it's broken.
48:50 Right.
48:51 It, it, nothing, nothing triggers that, that call to the, pyproject.toml to see what, what you need, for a wheel.
49:00 The best way to do it is with pip.
49:01 or the original way to do it was with pip wheel.
49:04 because pip has to be able to build wheels in order to, install things.
49:08 The, that got added to pip before build existed.
49:13 but now the best way to do it would be with build wheel.
49:16 And that's actually, it's doing the right thing.
49:17 It's actually trying to build the wheel you want.
49:19 Whereas pip wheel is actually just building a wheelhouse.
49:23 So if you depend on numpy and numpy doesn't have wheels, which they did better with Python 3.10.
49:28 So I'm not going to complain about, about numpy for Python 3.10, but for 3.9, they didn't
49:33 have wheels for a while.
49:33 So it'll build the wheels there and it'll build your wheels and it'll dump them all in the
49:37 wheelhouse or whatever the output is.
49:39 So you'll get, you'll be building numpy wheels, which you definitely don't want to try to upload.
49:43 Yeah.
49:43 Yeah, definitely not.
49:44 All right.
49:45 Well, that's, that's really cool.
49:46 And I definitely learned something.
49:47 I will start using build instead of, doing it the other way.
49:51 You can now delete your setup.py too.
49:53 Yeah.
49:53 That's the big thing, right?
49:54 So you don't have to run that kind of stuff, right?
49:56 Yeah.
49:57 The, they're trying to move away from the any commands to setup.py because you don't
50:01 even need one anymore.
50:02 And, you can't control that, that environment.
50:04 It's, it's very much an internal detail.
50:06 Like wrapping up this segment of the conversation, we want a wheel because that's best.
50:12 It installs without requiring the compiler tools on our system.
50:15 It installs faster.
50:16 It's built just for our platform.
50:19 The challenge is when you become a maintainer, you got to solve this, this matrix of different
50:26 Python versions that are supported and different platforms.
50:28 Like for example, there's macOS Intel, there's macOS, M1, Apple Silicon.
50:33 There's multiple versions of windows.
50:35 There's different versions of Linux, right?
50:38 Like arm Linux versus AMD 64 Linux.
50:41 Yeah.
50:42 And now muscle, muscle Linux versus the other Linux varieties.
50:45 Yeah.
50:46 So one of the challenges with a wheel, is making it distributable.
50:51 So if you just go out and you build a wheel and then you try to give it to someone else,
50:54 it may not work.
50:54 certainly on, Linux, if you try to pretty much, if you do that, that it just won't
51:00 work.
51:00 because the systems are going to be different.
51:02 on macOS, it'll only work on the version you compiled it on and not anything, older.
51:07 And, you'll even see people trying to compile on, on macOS 10.14 because they're,
51:13 they want their wheels to work as in many places as you want.
51:16 Well, you can use the latest one.
51:17 There's ways to fix that.
51:19 Well, exactly.
51:20 It's fine.
51:20 The jankiest, like I've got a Mac mini from nine, from 2009.
51:24 We're building on that thing.
51:25 Cause it will work for most people.
51:27 Right.
51:27 I think that's how they actually build the official Python binaries.
51:30 Interesting.
51:31 I'm not sure.
51:32 But then Apple went in like last year around this time, they threw a big spanner in the
51:37 works and said, you know what?
51:38 We're going to completely switch to arm and our own silicon.
51:41 And, you got to compile for something different now.
51:43 Yeah.
51:44 And cross compiling has always been a challenge.
51:46 yeah.
51:47 And then windows is actually the easiest of all of them.
51:49 You're most likely on windows to be able to compile something that you can give to someone
51:53 else.
51:53 But, the rest of the, that is one of the things that Microsoft's been really pretty good
51:57 at is backwards compatibility.
51:58 I get holds them back in other ways, but yeah, typically you can run an app from 20 years ago
52:03 and it'll still run.
52:04 Yeah.
52:04 And there are a few caveats, but not, not many compared, at least compared to the other
52:09 systems.
52:09 Apple's really good, but you do have to, you do have to understand how to, you do have
52:14 to set your minimum version and you have to get a Python that had that minimum version set when it was compiled.
52:19 But if you do that, it works really well.
52:21 So what I actually did with what actually started with in scikit-heb, I had this, I had, I was
52:26 building boost histogram, which needed to be able to run anywhere.
52:29 That was something I absolutely wanted.
52:30 It had to be pip install boost histogram and it just worked no matter what.
52:33 And also we had several other compiled packages at the time, several we had inherited.
52:37 and, I mean, you is, was compiled and that was quite popular.
52:41 We had a couple of specific ones and we had a couple, a couple more that ended up being,
52:45 becoming interested in, in that.
52:47 fact during this sort of period is when awkward started compiling pieces.
52:51 And so what I started with was building my own system to do this.
52:55 It was called, Azure wheel helpers, which, was, as you can guess by the name, Azure
53:00 was basically a set of Azure, DevOps scripts.
53:04 It was right after Azure had come out.
53:05 And I wrote a series of blog posts on this and described the exact process.
53:09 and sort of the things I'd found out about how you build a compatible wheel.
53:13 on macOS, you have to make sure you get the most compatible, CPython from, from
53:20 python.org itself.
53:21 You, you know, can't use, you can't use brew or something like that because those are
53:25 going to be compiled for whatever system they were targeting.
53:27 And on Linux, you have to, you have to run the mini Linux system and you should run audit
53:33 wheel and actually in Mac, you should run develop.
53:35 develop, bill wheel, although I might be getting him, I think it's the bill wheel.
53:39 so there's this, this series of things that you have to do.
53:41 And I started maintaining this, this multi hundred line set of scripts to do this.
53:47 And, and I was also being limited by Azure at the time.
53:51 They didn't have these, all the templates and stuff they have now.
53:53 So everything had to be managed through get subtree because it couldn't be a separate
53:57 repository.
53:58 And, and I, and then when, Jim started working awkward, he went and just rewrote the
54:03 whole thing to, cause it thought it.
54:05 He wanted it to look simpler for him and took a couple of things out that were needed and
54:09 suddenly made it two separate things.
54:11 Now I had to, had to help maintain that.
54:13 So when Python 3.8 or whatever it was came out, now I had, I had a completely different
54:17 set of changes I had to make for that one.
54:18 And it was really just not, it was not working out.
54:21 it was not very easy to maintain.
54:22 And I was watching, CI build wheel.
54:25 I, and it was this package.
54:27 It was a Python package that would, would do this.
54:30 And it didn't matter what CI system you were on because it was written in Python.
54:34 and it, it followed nice, Python principles for good package design and had unit tests and all that sort of stuff.
54:40 So it looked really good.
54:41 There were a couple of things that was missing.
54:43 I came in, I added, I made PRs for the things that I'd, I'd come up with that it didn't have.
54:47 And they got accepted.
54:48 And, there was a shared maintainer between PI bind 11 and CI build wheel as well.
54:52 I think that's one of the reasons that I sort of had heard about it.
54:55 It was really watching it.
54:55 And I finally decided just to make the switch.
54:57 And, I did at some point, a little later, I actually became a maintainer of CI build wheel.
55:02 But, I think I started doing the switch before I made it really easy once I was a maintainer to say, oh, this is a package that, you know, we have some control over.
55:09 It's okay.
55:09 Let's just.
55:10 Right.
55:10 Your package is a choice to depend upon this.
55:13 Cause we have a say.
55:14 It just took out all of that, that maintenance.
55:16 And now, depend about does all the maintenance for us.
55:20 It does the pin moves for the pin and see a build wheel.
55:23 And that's it.
55:23 Nice.
55:23 So if I want to accomplish, if I'm a package developer owner, and I want to share that package with everybody,
55:32 we've already determined we would ideally want to have a wheel, but getting that wheel is hard.
55:37 So CI build wheel will let you integrate it as the name indicates into your continuous integration.
55:43 And one of those steps of CI could be build the wheel, right?
55:46 But it pretty almost, it reduces it down to pretty much that, that there's a step in your CI that says, you know, run CI build wheel.
55:54 And then CI build wheel is designed to really integrate nicely with the build matrix.
55:59 So, you could, in, for a fairly simple package or for many packages, you can really just do Mac, Windows, and Linux have the same job.
56:06 I can get up actions.
56:07 It's easy to do the same job.
56:09 and then I'll see, I build wheel.
56:11 And that's about it.
56:13 It just goes through all the different versions of Python that are supported.
56:16 It goes, it just goes through and makes, a wheel for each.
56:21 And, in fact, it even has one feature that was really nice that, I'd always struggled with a bit is testing.
56:27 So if you give it a test command, it will even take your, your package, it will install it in a new environment.
56:32 That's not, you know, in a different directory.
56:33 That's not related to your build at all.
56:35 And make sure it works and passes whatever test you give it.
56:38 And, We'll do that across the platforms.
56:40 We'll do like a macOS test and a Windows test.
56:42 Yeah.
56:43 For each.
56:43 So CI build wheel really just sees the platform it's sitting on because it's inside the build matrix.
56:47 And so it's run, run for each.
56:49 And, yeah, it does for each, each one.
56:52 It, it, will run that, that test.
56:55 And the most, the simplest test is just echo.
56:57 And that will just make sure it installs.
56:58 Cause it won't try to install your wheel unless, there's something in that test command.
57:03 Even that's useful sometimes.
57:04 Even that's broken sometimes because of NumPy not supporting one of those things in that matrix.
57:09 Yeah.
57:09 It can't install the dependencies.
57:11 So that step fails or something.
57:12 So, it says it currently supports GitHub actions, Azure pipelines, which I don't know how long those are going to be two separate things.
57:20 Maybe they'll always be separate, but Microsoft owning GitHub be like, they're saying do stuff in Azure pipelines.
57:26 And then they're kind of moving.
57:27 Like, yeah, I think they're similar.
57:28 The runners are the same.
57:29 They actually have the same environments.
57:31 so I think they'll exist just as two different interfaces probably.
57:35 And Azure is not so tied to GitHub and it has more of an enterprise type.
57:39 Yeah, for sure.
57:40 It definitely has a different focus.
57:41 It was just a rewrite and a better rewrite in most cases of it.
57:44 I got to learn.
57:45 Yeah.
57:46 They, I think GitHub actions came second.
57:47 All right.
57:48 So then Travis CI app, Bayer circle CI and get lab CI, at least all of those, right?
57:53 At least those are the, those are the ones we test on.
57:56 And then, it runs locally.
57:58 there are some limitations to running it locally.
58:00 if it's, if you target Linux and you can, any, any system that has Docker in target
58:06 Linux, you can just ask to build Linux wheels.
58:08 You can actually run it from like my Mac or from windows.
58:11 I assume from a Windows machine.
58:12 I tried Windows with Docker and, Windows.
58:15 It does install to a, a standard location, C colon backslash CI build wheel.
58:20 But other than that, it's safe to run it there.
58:22 And macOS, it will install to your macOS system.
58:25 It's installed system versions of Python.
58:27 So that's something we haven't solved yet.
58:29 Might be able to someday.
58:30 so it's not a good idea unless you really are okay with installing every version of Python
58:34 that ever existed into your, into your system.
58:37 maybe get a VM of your Mac.
58:40 The Python.org Python.
58:41 Yeah.
58:42 Yeah.
58:42 I mean, it's somewhat safe.
58:43 if you're on Windows, you could use, a Windows subsystem for Linux, WSL as well.
58:50 In addition to Docker, I suspect.
58:51 Although I haven't tried that.
58:53 Mini Linux has to run, you could, I'm sure as long as you can launch Docker, the thing
58:57 that you have to be able to do is launch Docker because you have to use the, mini Linux
59:02 Docker images or you should use that or derivative of that.
59:05 There's lots of rules to exactly what can be in the environment and things like that.
59:09 And PyPA maintains that.
59:12 One thing that also helps is that we have the main, mini Linux maintainer is also a CI
59:17 build wheel maintainer.
59:18 So it's one reason that those things tend that, they fit well together.
59:22 Features tend to match and come out at the same time.
59:25 Like, like Musa Linux, which is a big, big thing recently.
59:27 It's not actually in a released version of CI build wheel yet.
59:30 What is Musa Linux?
59:31 So a normal Linux, is based on Glib C and that's actually what controls.
59:37 It's one of two things that controls mini Linux.
59:39 So if, can you download the binary wheel or do you have to build?
59:43 If you have a old version of pip that will, they had to teach pip about each
59:48 version of mini Linux.
59:48 That was a mess.
59:50 So they eventually switched to not to a standard numbering system.
59:53 That is your G lib C number.
59:55 And now pip come.
59:56 Doesn't.
59:56 And the current pip will be able to install a future mini Linux as long as your systems.
01:00:00 But, that was a big problem.
01:00:02 So pip nine can only install mini Linux one.
01:00:05 It can't install mini Linux.
01:00:06 And even if your G lib C is fine for it.
01:00:08 So the real, the other thing is the G lib C version and, mini Linux one was based on
01:00:14 Sinto S five, grid hat five.
01:00:16 mini Linux 2010 was Sinto S six mini Linux 2014 was Sinto S seven.
01:00:22 And then now they switched to Debian because of the, Sinto S sort of switching to the
01:00:26 stream model.
01:00:27 so mini Linux two 14 or two 24 is G lib C 2.24.
01:00:32 And that's Debian eight or something like that.
01:00:36 And so, but that's G lib C based.
01:00:38 There are, distributions that are not G lib C based, most notably Alpine, very used
01:00:43 Alpine.
01:00:44 it's this tiny, tiny little Docker image.
01:00:47 It's really fun distribution to use if you're on Docker, but, it actually sounds fun to
01:00:51 install, but I've never tried it without Docker, but, it's these five megabyte Docker
01:00:56 wheels or Docker is Docker doesn't do wheels.
01:00:58 Docker images, Docker images.
01:01:00 Yeah.
01:01:00 But, that doesn't use G lib C.
01:01:02 That uses muscle.
01:01:03 And so muscle Linux will run on Alpine.
01:01:07 Okay.
01:01:07 Got it.
01:01:08 So if you're building for the platform Alpine and similar ones, right.
01:01:12 So anything.
01:01:14 Yeah.
01:01:15 And you said, I can run this locally as well.
01:01:17 I know I would use it in CI cause I'm trying, I've got that matrix of all the versions
01:01:22 of CPython and pipe pipe, P Y P Y, and then all the platforms that I want to check as many
01:01:28 of those boxes as possible to put wheels in it.
01:01:30 Right.
01:01:31 yeah.
01:01:31 Suppose I'm on my Mac and I want to make use of this to fill in, maybe do some testing,
01:01:37 at least on some of these columns.
01:01:38 Like, how do I do that?
01:01:39 What's the benefit there?
01:01:41 Well, I can tell you the case where it happened.
01:01:42 so we were shipping, CMake and the psychic build organization ran out of Travis credits
01:01:50 and, they were being built.
01:01:52 We hadn't switched them over to being emulated builds on, GitHub actions yet.
01:01:57 And it just ran out.
01:01:58 We couldn't, we couldn't build them.
01:01:59 And one of them had been missed and we also weren't waiting to upload.
01:02:02 So we had uploaded everything, but we had one, one set or maybe, maybe it was all of the emulated
01:02:07 builds.
01:02:07 I think it was one set that didn't, didn't work.
01:02:09 And so we wanted to go ahead and upload those, those missing wheels.
01:02:13 And I tried, but, I couldn't actually get emulation.
01:02:16 Docker Q, Q, Q, Q, E, M, U emulation.
01:02:20 What?
01:02:20 I couldn't get that working on my Mac.
01:02:22 So, the mini links maintainer used his Linux machine and he, yeah, had Q, Q, E, M, U emulation
01:02:29 on it and he built the emulated images a few hours, but he just built them locally and then sent
01:02:34 and then uploaded, filled in the missing wheels.
01:02:36 So if, if I'm, maintaining a package, I'm, I got some package I'm putting on pipe AI and I want
01:02:43 to test it.
01:02:44 Does it make sense to do it locally or does it just make sense to put it on a summit?
01:02:47 Some CI system.
01:02:49 for C and build wheel, usually I do some local testing, but I'm also developing C and build
01:02:54 wheel, but, you know, usually it's probably fine to do this in your, just in your CI and
01:02:59 usually don't want to run the full, full thing.
01:03:00 Every time usually you have your regular unit tests, but C and build wheel is going to be
01:03:04 a lot slower because it's going through and it's making each set of wheels and launching
01:03:08 Docker images and things like that.
01:03:09 and it's installing Python each time, for macOS and windows.
01:03:13 So, usually unless if you have a fairly quick build, I've seen some people just run C and
01:03:17 people just run CI build wheel as part of their test suite.
01:03:19 but usually you just run it, say right before release.
01:03:22 Maybe I usually do it once before the release and then I'm the release.
01:03:25 Right.
01:03:26 Exactly.
01:03:26 Okay.
01:03:26 That makes sense.
01:03:27 Cause it's a pretty heavyweight type of operation.
01:03:30 So when I look at all these different platforms, I see macOS Intel, macOS, Apple Silicon, different,
01:03:35 businesses of windows.
01:03:37 And then I think about CI systems, you know, what CI systems can I use that support all these
01:03:43 things?
01:03:43 Like does GitHub actions support both versions of macOS, for example, plus windows.
01:03:48 GitHub actions is by far our most popular, platform.
01:03:52 It switched very quickly.
01:03:53 It used to be Travis.
01:03:54 Travis was a challenge cause they didn't do windows very well.
01:03:56 They still don't do windows very well.
01:03:57 and it's a challenge for us because we actually can't run our macOS, tests on them anymore.
01:04:02 Because once we joined the pipe PA, the billing became an issue and we just basically just
01:04:07 lost, macOS running for it.
01:04:09 but, circle, I think, Azure and GitHub actions, I think they do all three.
01:04:16 and you can always flip things up.
01:04:18 I always do Travis for the Linux and then app fair for windows.
01:04:22 You can do it that way.
01:04:24 One of the big things that I had developed for CI build wheel was the, pipe project
01:04:29 dot Tomo or any Tomo configuration, usually that, configuration for CI build wheel.
01:04:34 That way you can get your CI build wheel configuration out of your, your, YAML files.
01:04:40 That way it works locally.
01:04:41 which is one of the main, one of the things I was after, but also you can just do it and
01:04:45 then run on several different systems.
01:04:47 Like you might like the fact that Travis, Travis is, I think the only one that does the,
01:04:51 native strange architectures.
01:04:53 You have to emulate it other places, which is a lot slower, five times slower or something.
01:04:58 Yeah.
01:04:58 So kind of split that up, get the, the definition and then create maybe multiple
01:05:02 CI, jobs.
01:05:04 Your CI scripts are really simple.
01:05:05 Yeah.
01:05:05 Yeah.
01:05:06 Yeah.
01:05:07 Very cool.
01:05:07 The example script is just a few lines.
01:05:09 It doesn't, it does not take much to do this comparing.
01:05:12 Oh yeah.
01:05:12 Hundreds of lines it used to take.
01:05:14 Yeah, sure.
01:05:15 And I didn't even scroll down here.
01:05:16 You've got a nice grid on github.com/IPA slash CI build wheel that shows on GitHub actions,
01:05:22 which is supported on Azure pipelines.
01:05:24 What's supported.
01:05:25 It's not right.
01:05:26 Circle CI doesn't do this.
01:05:27 No, but yeah.
01:05:29 App there, Travis, Azure and GitHub do.
01:05:34 Where does the macOS, but we can't test it.
01:05:36 Theoretically, it does it.
01:05:38 Gotcha.
01:05:39 And then, yeah, I wonder about the, the M1, the Apple Silicon arm versions versus the Intel versions.
01:05:46 I don't know how, how well that's permeated into the world yet.
01:05:49 but the fact that they have Mac at all is kind of impressive.
01:05:52 Nobody has an M1 runner yet.
01:05:53 there are a few places I think now that you can purchase time on one, but no runners.
01:05:59 I mean, last I checked GitHub actions, you couldn't even write yourself on the M1.
01:06:03 that may be, that may have changed.
01:06:05 I don't know.
01:06:06 That was a while back.
01:06:07 Yeah.
01:06:07 I mean, there are some crazy, places out there.
01:06:10 I think there's one called Mac mini Colo.
01:06:13 I think that's what it's called.
01:06:14 Let me see if that's, yeah, I think that's it.
01:06:17 Yeah.
01:06:17 So you can get, you can go to these places like Mac mini Colo and get, get a whole bunch
01:06:24 of Mac minis and put them into this crazy data center.
01:06:28 But you know, that's not the same as I upload a text file into GitHub that says run on Azure
01:06:35 on GitHub actions.
01:06:36 And then that's the end of it.
01:06:37 Right.
01:06:37 You probably got to set up your whole, like some whole build system into a set of minis.
01:06:42 And like, that doesn't sound very practical for most people.
01:06:45 Ideally with what you could do is, I mean, you just need one mini and then you set up
01:06:49 a GitHub actions, hosted runner, a locally hosted runner.
01:06:53 and other systems in that too, get, get, get lab CI was big on that.
01:06:57 you can, you can do anything on get lab CI.
01:06:59 We just haven't tested that because they don't have those publicly.
01:07:01 But, if you, if you have your own, you can do that.
01:07:04 I know, I know somebody who does this with basically with root and runs the, has a Mac mini
01:07:09 and runs the M1 builds on that.
01:07:11 But, you could do that.
01:07:13 And I have a Mac mini and the lead developer of CI build.
01:07:15 Will also has a, Mac mini or the M1.
01:07:18 He has an M1 of something.
01:07:20 I don't know.
01:07:20 I have a Mac mini.
01:07:21 Mine is Mac.
01:07:22 That's what I'm, talking to you right now on.
01:07:24 It's a fantastic little machine.
01:07:25 Yeah.
01:07:25 It's, it's very impressive.
01:07:27 I love the way the boost histogram.
01:07:28 It was fast.
01:07:29 I have a 16 inch, almost maxed out, MacBook and the Mac mini M1.
01:07:34 It was faster on boost histogram than this thing.
01:07:36 Wow.
01:07:36 Yeah.
01:07:37 I have a maxed out six, 15 inches, a little bit older, a couple of years, but I just don't
01:07:42 touch that thing unless I literally need it as a laptop because I want to be somewhere else.
01:07:45 But yeah, I'm definitely not drawn to it.
01:07:47 These, so you could probably set up one of these minis for 700 bucks and then tie it up.
01:07:52 But that's again, not as easy as, you know, just clicking the public free option that works,
01:07:56 but still it's, it's within the realm of possibility.
01:07:59 Yeah.
01:08:00 And Apple has actually helped out several, like, I know, homebrew and a few others they've
01:08:04 helped out with, by giving them either Mac minis or some, some, something that they
01:08:10 could build with.
01:08:10 So they, I believe, brew actually builds, homebrew actually builds on him on real
01:08:17 M1s.
01:08:17 I know it does because they're, the builds are super fast.
01:08:20 I remember that.
01:08:20 Like it builds root like 20 minutes, the root recipe, because I maintain that.
01:08:24 And the normal one takes about an hour because running on multiple cores, but, it's like
01:08:29 three times faster.
01:08:30 It's done in 20 minutes.
01:08:31 I just thought something was wrong when I first saw that.
01:08:33 That's it.
01:08:34 How could it be done?
01:08:35 Something broke.
01:08:36 What broke?
01:08:36 Interesting.
01:08:37 All right, Henry, we're getting really short on time, a little bit over, but it's been a
01:08:40 fun conversation.
01:08:41 How about you give us a look at the future?
01:08:42 Where are things going?
01:08:43 with all the stuff.
01:08:45 Next thing I'm interested in, in, being involved with is scikit build, which,
01:08:50 is a, a package that currently sort of augments setup tools, but hopefully will eventually
01:08:57 sort of replace setup tools as your, as the thing that you, build with.
01:09:01 And it will call out to CMake.
01:09:02 So you basically just, you basically write a CMake, file.
01:09:06 And this could wrap an existing package, or maybe you need some of the other things that
01:09:11 CMake has.
01:09:11 And this will then let you build that as a regular Python package.
01:09:16 In fact, recently somebody, sort of put together, see, I build a wheel.
01:09:19 Yeah.
01:09:20 It's like it built in the CMake example and, and built, LLVM and pulled out just the
01:09:25 claim format tool and made wheels out of that.
01:09:28 And now you can just do pip install clang format.
01:09:30 It's one to two megabytes.
01:09:32 It works on all systems, including Apple Silicon and things.
01:09:34 I just tried it on Apple Silicon yesterday and it's a pip install.
01:09:37 Now you can claim format C++ code.
01:09:39 And that's just, you know, mind blowing.
01:09:41 You can add it to pre-commit.
01:09:42 The pre-commit CI, it runs in two.
01:09:44 I mean, I'd been fighting for about a week to reduce the, the, size of the claim format
01:09:48 recipe from 600 megabytes to just under the 250.
01:09:52 That was the maximum for pre-commit.CI.
01:09:54 And then you can now pip install under about a megabyte for Linux, that, that sort of thing.
01:09:59 And I, I think that would be really, that would be a really great thing to, to work on.
01:10:04 It's been around since 2014, but it needs some, some serious work.
01:10:08 And so I'm currently actually working on writing a grant to, to try to get funded, to just work
01:10:13 on, basically the scikit build, scikit build system and looking for interesting science
01:10:17 use cases that would be interested in, adapting, or switching, existing build
01:10:23 system over or adapting to it.
01:10:24 or taking something that has never been available from Python and making it available.
01:10:30 And yes, root, root might be one.
01:10:32 Scikit build package.
01:10:33 I'm looking for wide variety.
01:10:35 Yeah.
01:10:35 How neat.
01:10:36 Scikit build package is fundamentally just the glue between set of tools, Python module and
01:10:40 CMake.
01:10:40 Yeah.
01:10:41 So it's a real way to take some of these things that were based on CMake and sort of expose
01:10:45 them to Python.
01:10:46 Yeah.
01:10:46 So you can just, have a CMake package that does all the CMake things well, you know,
01:10:51 like finding, finding different libraries and, and that I'm a big CMake person.
01:10:55 But how many of you physically uses it very heavily.
01:10:57 Most C++ does.
01:10:58 It's about 60%.
01:10:59 I think of all field systems are, are CMake based now.
01:11:03 going from GitWare's numbers, but they make CMake.
01:11:06 But, it's, I think it's a, it's very powerful.
01:11:10 It can be used for things like that.
01:11:11 And, will really open up a much easier C++, more natural in C++ and C and, and
01:11:18 Fortran and things like that.
01:11:19 And CUDA then is currently available.
01:11:20 Setup tools is, disto tools is going away in Python 3.12.
01:11:23 Setup tools is not really designed to build C++ packages or packages.
01:11:29 It was really just a hack on top of disto tools, which happened to be build just Python itself.
01:11:34 So, well, scikit build sounds like the perfect tool to apply to the science space because
01:11:40 there's so many of these weird compiled things that are challenging to, you know, install and
01:11:45 deploy and share and so on.
01:11:47 So making that easier.
01:11:48 Sounds good.
01:11:48 All right.
01:11:49 Well, I think we're probably going to need, need to leave it there just for the sake of time,
01:11:53 but it's been, it's been awesome to talk about all the internals of supporting scikit
01:11:59 hep and people should check out CI build wheel.
01:12:02 It looks like it, you know, if you're maintaining a package either publicly or just for internal
01:12:06 for your organization, it looks like it'd be a big help.
01:12:08 Yeah.
01:12:09 If it's got binary, any sort of binary build in it.
01:12:11 Yes.
01:12:11 Yeah, absolutely.
01:12:12 If not build is fine.
01:12:13 Yeah.
01:12:14 Right.
01:12:14 And I learned about build, which is good to know.
01:12:16 All right.
01:12:17 So before you get out of your Henry, let me ask you the two final questions.
01:12:20 you're going to write some code, I mean, like Python code, what editor would you use?
01:12:25 Depends on how much it'll either be VI.
01:12:27 If it's a very small amount.
01:12:29 if it's a really large project that let's say it takes several days, then I'll use,
01:12:34 PyCharm.
01:12:35 And then I've really started using VS Code quite a bit.
01:12:38 And that's sort of expanding to fill in all the middle ground and kind of eating in on both
01:12:42 of the other, both of the edges.
01:12:43 Yeah.
01:12:44 Yeah.
01:12:45 There's some interesting stuff going there.
01:12:46 Good choice.
01:12:46 But all with the VI, mode or plugins added, of course.
01:12:51 And then, notable PyPI package.
01:12:53 I mean, we probably talked about 20 already.
01:12:55 If you want to just give a shout out to one of those, that's fine.
01:12:57 Or if you got a new idea.
01:12:58 I'm going to go with one that's, unlike, I might not get mentioned, but I, I, I'm really
01:13:03 excited by it.
01:13:04 The development of it is, the, I think the developer is quite new, but what he's actually
01:13:09 done as far as the actual package has been, been nice.
01:13:12 It needs, it needs some, some nice touches.
01:13:14 But, and that is plot text, yellow T T E X T.
01:13:19 And I'm really excited about that because it makes these, the actual plots it makes are
01:13:23 really, really nice.
01:13:24 And they're plotted to the terminal and, it can integrate with rich.
01:13:28 and of course, I'm, I'm interested in it because I want to integrate it with tech.
01:13:32 I want to see it integrated with a textual, I think a textual app that combines this with,
01:13:38 file browsers and things like that.
01:13:41 It'd be incredible.
01:13:42 Yeah.
01:13:42 So you can do things like the terminal, for example.
01:13:45 Yeah.
01:13:45 So you could like cruise around your files, use your, your root IO integration, pull
01:13:51 these things up here and, you know, put the plot right on the screen.
01:13:53 Right.
01:13:53 But in the terminal.
01:13:54 Okay.
01:13:55 Yeah.
01:13:55 This is really cool.
01:13:56 I had no idea.
01:13:56 And this is based on rich.
01:13:57 You say.
01:13:58 it can integrate with rich.
01:13:59 It integrates with rich.
01:14:00 Okay.
01:14:00 Got it.
01:14:01 Yeah.
01:14:01 So as soon as I saw it, I started trying to make sure the two people were talking to each
01:14:04 other.
01:14:05 Will and the person who is developing this.
01:14:08 Yeah, exactly.
01:14:08 All right.
01:14:09 These things work together.
01:14:10 That's very cool.
01:14:11 They seem like they should, right?
01:14:12 They're in the same general zone.
01:14:14 Yeah.
01:14:14 And they do now.
01:14:15 The, you had, there had to be some communication back and forth as far as what size the plots
01:14:19 were.
01:14:19 Right.
01:14:20 This should, this should work in it.
01:14:21 A good recommendation.
01:14:22 definitely one I had not learned about.
01:14:24 So I'm sure people will enjoy that.
01:14:25 All right, Henry, final call to action.
01:14:27 People want to do more with wheels, CI build wheel, or maybe some of the other stuff we talked
01:14:31 about.
01:14:31 What do you tell them?
01:14:32 look through, I think one of the best places to go is the psychic developer pages.
01:14:36 If you have no interest in psychic tools or hep at all.
01:14:40 and that sort of shows you how all these things integrate together really well.
01:14:43 And, has nice, has nice documentation.
01:14:46 Of course, CI build wheel itself is nice.
01:14:48 And the pipe PA, a lot of the IP projects have gotten, good documentation as well as packaging
01:14:54 of python.org.
01:14:55 We've updated that quite a bit.
01:14:56 Look like to reflect some of these things, but I would really, I really like the psychic
01:15:01 developer pages.
01:15:03 I mean, I'm biased because I wrote most of them.
01:15:04 Nice.
01:15:06 Yeah.
01:15:06 I'll link to those.
01:15:07 And I'll, I'll try to link to pretty much everything else that we spoke to as well.
01:15:11 So people can check out the podcast player show notes to find all that stuff.
01:15:14 I guess one final thing that we didn't call out that I think is worth pointing out is CI build
01:15:18 wheel is under the pipe, the Python packaging authority.
01:15:21 So it gives it some officialness, I guess you should say.
01:15:24 Yes.
01:15:24 That happened after, after I joined one of the first things I wanted to do was I thought this
01:15:28 should really be in the pipe PA.
01:15:30 And, I was sort of pushing for that and the other developers were fine with that.
01:15:34 And so we brought it up and, I actually joined the pipe PA just before that by becoming
01:15:39 a member of build.
01:15:40 so I got to vote on CI build wheel coming in, but it was a very enthusiastic vote, even
01:15:44 without my vote.
01:15:45 and pipX joined right at the same time too.
01:15:48 So those were, it was fighting time.
01:15:50 PipX is a great library.
01:15:51 I really like the way pipX works.
01:15:53 It's a great tool.
01:15:54 All right, Henry, thank you for being here.
01:15:56 It's been great.
01:15:57 Thanks for all the insight on all these internals around building and installing Python packages.
01:16:01 There's also a lot more in my blog.
01:16:03 So I sign them, pie.
01:16:04 Dot.
01:16:04 Get lab.
01:16:05 Dot.
01:16:05 I.
01:16:05 Oh, so that's also a link to look that links to all those other things, obviously do.
01:16:09 Thanks again for being here.
01:16:10 Yeah.
01:16:11 See ya.
01:16:11 Thanks for having me.
01:16:11 Yeah.
01:16:12 You bet.
01:16:12 This has been another episode of talk Python to me.
01:16:16 Our guest on this episode was Henry Schreiner and it's brought to you by us over at talk
01:16:20 Python training and the transcripts were brought to you by assembly AI.
01:16:24 Do you need a great automatic speech to text API?
01:16:27 Get human level accuracy in just a few lines of code.
01:16:30 Visit talkpython.fm/assembly AI.
01:16:32 Want to level up your Python?
01:16:34 We have one of the largest catalogs of Python video courses over at talk Python.
01:16:38 Our content ranges from true beginners to deeply advanced topics like memory and async.
01:16:43 And best of all, there's not a subscription in sight.
01:16:46 Check it out for yourself at training.talkpython.fm.
01:16:49 Be sure to subscribe to the show.
01:16:51 Open your favorite podcast app and search for Python.
01:16:54 We should be right at the top.
01:16:55 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the
01:17:01 direct RSS feed at /rss on talkpython.fm.
01:17:05 We're live streaming most of our recordings these days.
01:17:08 If you want to be part of the show and have your comments featured on the air, be sure to
01:17:12 subscribe to our YouTube channel at talkpython.fm/youtube.
01:17:16 This is your host, Michael Kennedy.
01:17:17 Thanks so much for listening.
01:17:19 I really appreciate it.
01:17:20 Now get out there and write some Python code.
01:17:22 I'll see you next time.