Learn Python with Talk Python's 270 hours of courses

Using cibuildwheel to manage the scikit-HEP packages

Episode #338, published Sun, Oct 17, 2021, recorded Thu, Oct 14, 2021

How do you build and maintain a complex suite of Python packages? Of course, you want to put them on PyPI. The best format there is as a wheel. This means that when developers use your code, it comes straight down and requires no local tooling to install and use.

But if you have compiled dependencies, such as C or FORTRAN, then you have a big challenge. How do you automatically compile and test against Linux, macOS (Intel and Apple Silicon), Windows, and so on? That's the problem cibuildwheel is solving.

On this episode, you'll meet Henry Schreiner. He is developing tools for the next era of the Large Hadron Collider (LHC) and is an admin of Scikit-HEP. Of course, cibuildwheel is central to this process.

Watch this episode on YouTube
Play on YouTube
Watch the live stream version

Episode Deep Dive

Guest Introduction and Background

Henry Schreiner is a seasoned Python and C++ developer working at Princeton University, focusing on High-Energy Physics (HEP) and research software engineering. He is a core maintainer of the Scikit-HEP library suite, contributes to various open-source projects including PyBind11, and works on next-generation tools for the Large Hadron Collider data analysis stack. Henry’s work bridges C++ and Python, emphasizing tools that make scientific computing more accessible and efficient. He is also a maintainer of CI Build Wheel, a project that simplifies building and distributing Python wheels across platforms and Python versions.

What to Know If You're New to Python

Here are a few essentials to help you navigate this episode more effectively:

  • Python Installation: Make sure you have a standard CPython installation, ideally version 3.7+.
  • Package Management: Familiarize yourself with pip install and virtual environments so you can isolate your projects.
  • Understanding Binary Extensions: Many scientific or performance-focused Python packages have C or C++ components. Knowing that they need specialized building steps (compilers, etc.) will help clarify references in this episode.
  • Documentation: Check out the official Python Docs for an in-depth guide on packaging and distribution basics.

Key Points and Takeaways

  1. CI Build Wheel: A Solution for Cross-Platform Wheels CI Build Wheel greatly simplifies the process of building, testing, and distributing Python wheels across multiple operating systems and Python versions. It helps manage complex tasks like setting up specific Python environments and ensuring a consistent, fully tested wheel for Linux, macOS (Intel and Apple Silicon), and Windows.
  2. Scikit-HEP: A Suite of Particle Physics Tools in Python Scikit-HEP is a collection of libraries designed to support high-energy physics workflows with Python. It includes specialized utilities for histogramming, file reading, vector manipulations, and more, bringing modern Python practices to the HEP community.
  3. Awkward Array for Irregular Data Structures Awkward Array provides a NumPy-like interface for handling nested or variable-length data, such as lists of lists with differing lengths. It integrates with Numba for just-in-time compilation and supports large-scale data common in HEP and beyond (e.g., genomics).
  4. Boost Histogram and Hist: Advanced Histogramming in Python Boost Histogram (C++ library) and Hist (Pythonic API) enable efficient histogram creation, manipulation, and rebinning. They bring object-oriented capabilities to histogram data, making it easier to store, label, and operate on multi-dimensional bins.
  5. PyBind11: Seamless C++ and Python Interoperability PyBind11 is a powerful header-only library for exposing C++ code to Python, making it straightforward to build extension modules without learning a new “binding language.” This approach simplifies bridging performance-intensive routines with Python’s high-level features.
  6. CI Build Wheel Workflow and Testing CI Build Wheel not only compiles wheels but can also test them in isolated environments. This ensures a clean separation between build and test, preventing hidden dependencies from leaking into your final distributions.
  7. Minimizing Complexity with Docker on Linux Builds Building Linux wheels is typically done using Docker images based on the manylinux policy. By standardizing the build environment, developers ensure compatibility across various Linux distributions while avoiding system-specific configuration issues.
  8. Apple Silicon (M1) and Cross-Architecture Builds With Apple Silicon, teams face an additional dimension of complexity. Tools like CI Build Wheel aim to unify building Intel-based macOS wheels and arm64-based Apple Silicon wheels, though native CI runners for M1 remain limited as of this conversation.
  9. Scikit-Build: A Next Step for C++/Python Projects Scikit-Build leverages CMake to build complex, multi-language Python packages. Henry mentioned how modernizing scikit-build could reduce friction when combining C++ libraries, CUDA code, or advanced build flows within Python packages.
  10. The Power of Wheel Distribution for Scientific Computing Throughout the conversation, Henry underscored the efficiency gained by distributing wheels for scientific tools. Whether for HPC, data analysis, or advanced computing, wheels ensure that installation is streamlined, reproducible, and free of compiler hassles.

Interesting Quotes and Stories

  • On distributing HPC packages: Henry highlighted how previously, “you’d have a special Python install that took hours to set up,” but now with wheels and tools like CI Build Wheel, “you can just pip install and everything works.”
  • On bridging Python and C++: “I like to be in that space between the two,” Henry said, emphasizing how modern Pythonic approaches can make C++ functionality more accessible to data scientists.

Key Definitions and Terms

  • Wheel: A built, ready-to-install distribution format for Python packages (usually .whl files).
  • HEP: High-Energy Physics, focusing on subatomic particles and large-scale experiments like the Large Hadron Collider.
  • Numba: A JIT compiler that optimizes numerical Python code for faster execution.
  • manylinux: A set of Docker-based policies and images allowing wheel compatibility across different Linux distributions.
  • Awkward Array: A library that supports nested, variable-length arrays in a NumPy-like interface.

Learning Resources

Below are a few resources to explore or refine your Python packaging and scientific computing expertise.

Overall Takeaway

The conversation underscores the power of modern Python packaging and the scientific stack. Tools like CI Build Wheel, PyBind11, and Scikit-HEP empower developers and researchers to create cross-platform, high-performance solutions without sacrificing the convenience of Python’s ecosystem. By embracing these frameworks and best practices, teams can streamline data analysis, deliver consistent user experiences, and accelerate innovation in both academic and enterprise settings.

Henry on Twitter: @HenrySchreiner3
Henry's website: iscinumpy.gitlab.io

Large Hadron Collider (LHC): home.cern
cibuildwheel: github.com
plumbum package: plumbum.readthedocs.io
boost-histogram: github.com
vector: github.com
hepunits: github.com
awkward arrays: github.com
Numba: numba.pydata.org
uproot4: github.com
scikit-hep developer: scikit-hep.org
pypa: pypa.io
CLI11: github.com
pybind11: github.com
cling: root.cern
Pint: pint.readthedocs.io
Python Wheels site: pythonwheels.com
Build package: pypa-build.readthedocs.io
Mac Mini Colo: macminicolo.net
scikit-build: github.com
plotext: pypi.org
Code Combat: codecombat.com
clang format wheel: github.com
cibuildwheel examples: cibuildwheel.readthedocs.io
Cling in LLVM: root.cern

New htmx course: talkpython.fm/htmx
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Episode Transcript

Collapse transcript

00:00 How do you build and maintain a complex suite of Python packages?

00:03 Of course, you want to put them on PyPI.

00:06 The best format there is as a wheel.

00:08 This means that when developers use your code, it comes straight down and requires no local tooling to install and use.

00:14 But if you have complex dependencies, such as C or Fortran, then you have a big challenge.

00:20 How do you automatically compile and test against Linux, macOS, that's Intel and Apple Silicon, Windows, 32 and 64-bit,

00:30 and so on?

00:30 That's the problem solved by CI Buildwheel.

00:33 On this episode, you'll meet Henry Schreiner.

00:36 He's developing tools for the next era of the Large Hadron Collider and is an admin of Scikit-HEP.

00:42 Of course, CI Buildwheel is central to that process.

00:46 This is Talk Python to Me, episode 338, recorded October 14th, 2021.

00:52 Welcome to Talk Python to Me, a weekly podcast on Python.

01:08 This is your host, Michael Kennedy.

01:09 Follow me on Twitter, where I'm @mkennedy, and keep up with the show and listen to past episodes at talkpython.fm.

01:15 And follow the show on Twitter via at Talk Python.

01:19 We've started streaming most of our episodes live on YouTube.

01:22 Subscribe to our YouTube channel over at talkpython.fm/youtube to get notified about upcoming shows and be part of that episode.

01:30 Hey there, I have some exciting news to share before we jump into the interview.

01:34 We have a new course over at Talk Python.

01:36 HTMLX plus Flask, modern Python web apps, hold the JavaScript.

01:40 HTMLX is one of the hottest properties in web development today, and for good reason.

01:45 You might even remember all the stuff we talked about with Carson Gross back on episode 321.

01:50 HTMLX, along with the libraries and techniques we introduced in our new course,

01:55 will have you writing the best Python web apps you've ever written.

01:58 Clean, fast, and interactive, all without that front-end overhead.

02:01 If you're a Python web developer that has wanted to build more dynamic, interactive apps,

02:06 but don't want to or can't write a significant portion of your app in rich front-end JavaScript

02:11 frameworks, you'll absolutely love HTMLX.

02:14 Check it out over at talkpython.fm/HTMLX, or just click the link in your podcast player's show notes.

02:20 Now let's get on to that interview.

02:22 Henry, welcome to Talk Python to Me.

02:25 Thank you.

02:26 Yeah, it's great to have you here.

02:27 I'm always fascinated with cutting-edge physics with maybe both ends of physics, right?

02:34 I'm really fascinated with astrophysics and the super large, and then also the very, very small.

02:39 And we're going to probably tend a little bit towards the smaller, high-energy things this time around,

02:45 but so much fun to talk about this stuff and how it intersects Python.

02:48 Some of the smallest things you can measure and some of the largest amounts of data you can get out.

02:52 Yeah, the data story is actually really, really crazy, and we're going to talk a bit about that.

02:58 So neat, so much stuff.

03:00 We used to think that atoms were as small as things could get, right?

03:04 I remember learning that in elementary school.

03:05 There are these things called atoms.

03:07 They combine to form compounds and stuff, and that's as small as it gets.

03:11 And yeah, not so much, right?

03:13 Yeah, that was sort of what atom was supposed to mean.

03:15 Exactly, the smallest bit, but nope.

03:19 But that name got used up, so there we are.

03:21 All right, well, before we get into all that stuff, though, let's start with your story.

03:25 How did you get into programming in Python?

03:26 Well, I started with a little bit of programming that my dad taught me.

03:31 He was a physicist.

03:33 And I remember it was C++ and sort of taught the way you would teach Java,

03:37 you know, all objects and classes.

03:39 Yeah.

03:39 Just a little bit.

03:41 And then when I started at college, then I wanted to take classes, and I took a couple classes again in C++.

03:48 But I just really loved objects and classes.

03:51 Unfortunately, the courses didn't actually cover that much, but the book did.

03:55 So I really got into that.

03:57 And then for Python, actually, right when I started college, I started using this program called Blender.

04:02 Oh, yeah.

04:03 Blender.

04:03 I've heard of Blender.

04:04 It's like a 3D animation tool, like Maya or something like that, right?

04:08 And it's very Python-friendly, right?

04:10 Yes.

04:11 It has a built-in Python interpreter.

04:13 So I knew it had this built-in language called Python, so that made me really want to learn Python.

04:16 And then when I went to an REU, a research experience for undergraduates

04:22 at Northwestern University in Chicago.

04:25 And when I was there, we had this cluster that we were working on.

04:29 This was in solid-state physics, material physics.

04:32 And we would launch these simulations on the cluster.

04:36 And so I started using Python, and I was able to write a program that would go out,

04:42 and it would create a bunch of threads, and it would watch all of the cluster,

04:46 all the nodes in the cluster.

04:48 And as soon as one became available, it would take it.

04:49 So I could just, my simulation would just take the entire cluster.

04:52 After a few hours, I would have everything.

04:54 So at the end of that, everybody hated me, and everybody wanted my scripts.

04:58 Exactly.

04:59 They're like, this is horrible.

05:01 I can't believe you did that to me, but I'll completely forgive you if you just give it to me and only to me,

05:06 because I need that power.

05:07 Yeah, that's fantastic.

05:09 How neat.

05:10 So I think that is one of the cool things about Python, right, is that it has this quick prototyping approachability.

05:18 They're like, I'm just going to take over a huge hardware, right?

05:22 Like a huge cluster of servers, but it itself doesn't have to be like intense programming.

05:26 It can be like this elegant little bit of code, right?

05:28 You can sort of do things that, normally I think the programming gets in the way more,

05:32 but Python tends to stay out.

05:34 It looks more like pseudocode.

05:35 So you can do more and learn more, and eventually you can go do it in C++ or something.

05:40 Yeah.

05:41 Yeah, absolutely.

05:42 Great way to start.

05:43 Or maybe not.

05:44 Sometimes you do need to go do it in some other language, and sometimes you don't.

05:48 I think the stuff at CERN and LHC has an interesting exchange between C++

05:54 and maybe some more Python and whatnot, so that'll be fun to talk about.

05:59 Yeah.

05:59 We've been C++ originally, but Python is really showing up in a lot more places,

06:04 and there's been a lot of movement in that direction.

06:07 And there have been some really interesting things that have come out.

06:09 A lot of interesting things have come out of the LHC, computing-wise as well as physics.

06:13 Awesome.

06:14 Yeah.

06:15 As a computing bit of infrastructure, there's a ton going on there.

06:18 And as physics, it's kind of the center of the particle physics world, right?

06:22 So it's got those two parallel things generating, all sorts of cool stuff.

06:27 I want to go back to just really quickly to, you know, you talked about your dad teaching a little programming.

06:31 If people are out there and they're the dad, they want to teach their kids a little bit of programming,

06:36 I want to give a shout out to CodeCombat.com.

06:38 Such a cool place.

06:39 My daughter just yesterday was like, hey, dad, I want to do a little Python.

06:42 Remember that game that taught me programming?

06:45 Like, yeah, yeah, sure.

06:46 So she's like, she logged in and started playing and basically solve a dungeon interactively by writing Python.

06:52 And it's such an approachable way, but it's not the like draggy, dropy, fake stuff.

06:55 You write real Python, which I think is cool to introduce kids that way.

06:59 So anyway, shout out to them.

07:01 I had them on the podcast before, but it's cool to see kids like taking to it in that way, right?

07:05 Whereas you say it like, you could write a terminal app.

07:07 They're like, I don't want to do that.

07:08 But solve a dungeon.

07:10 Yeah, they could do that.

07:11 Yeah.

07:11 I've actually played with a couple of those.

07:12 They're actually really fun just to play.

07:13 Yeah, they are.

07:14 Exactly.

07:15 I did like 40 dungeons along with my daughter.

07:17 It was very cool.

07:18 How about now?

07:19 What do you do now?

07:19 So I work in a lot of different areas and I jump around a lot.

07:24 So I do a mix of coding.

07:26 I do some work on websites because they just needed maintenance and somehow I got volunteered.

07:33 And some writing.

07:35 So less coding than I would like, but I definitely do get to do it, which is fun.

07:40 Yeah.

07:40 And this is at CERN or at your university or where is this?

07:44 So now I'm at Princeton University and I'm part of a local group of RSEs, research software engineers.

07:52 And I'm also part of Iris Hep, which we'll talk about a little bit.

07:57 But that's sort of a very spread out group.

08:00 Some of us are at CERN, a few are in some other places, a few in some Fermilab.

08:06 And energy physicists are just used to working remote.

08:09 The pandemic wasn't that big of a change for us.

08:11 We were already doing all our meetings remote.

08:12 We just eventually changed from video to Zoom.

08:15 But other than that, it was pretty much the same.

08:17 Exactly.

08:17 It was real similar for me as well.

08:19 That's interesting.

08:20 Fermilab, that's in Chicago, outside Chicago, right?

08:23 Yes.

08:23 Is that still going?

08:24 I got the sense that that was shutting down.

08:26 They're big in neutrino physics.

08:27 So they do a lot of neutrino things there.

08:30 And then they're also very active just in the particle physics space.

08:34 So you may be at Fermilab, but working on CERN data.

08:37 I see.

08:37 Okay.

08:38 Interesting.

08:38 Yeah.

08:39 I got to tour that place a little bit and it's a really neat place.

08:42 It is.

08:43 CERN's a neat place too.

08:44 I would love to tour CERN, but it wasn't 20 minutes down the street from where I happened to be.

08:50 So I didn't make it there.

08:51 Sadly, I hope to get back there someday.

08:53 All right.

08:54 Well, let's talk about sort of the scikit-hep side of things and how you got into

09:02 maintaining all of these packages.

09:05 So you found yourself in this place where you're working on tools that help other people build packages

09:10 for the physicists and data scientists and so on, right?

09:14 So where'd that all start?

09:16 So with maintenance itself, the first thing I started maintaining was a package called Plumbum back in 2015.

09:23 And at that point, I was starting to submit some PRs and the author came to me and said,

09:30 I would like to have somebody do the releases.

09:32 I need a release manager.

09:34 I don't have time.

09:35 And I said, sure, I'd be happy to do it.

09:36 And it was exciting for me because it was the first package or like real package I got to join.

09:42 And so I think on the page, it might even still have the original news item

09:47 when it says, welcome to me.

09:48 But-

09:49 Nice.

09:50 So that was the first thing I started maintaining.

09:52 And then I was working on a IG physics tool called Goofit when I became a postdoc.

10:00 And I worked on sort of really renovating that.

10:03 It started out as a code written by physicists.

10:06 And I worked on making it actually installable and packaged nicely and worked with a student to add Python bindings to it,

10:14 things like that.

10:14 And as part of that, I wrote a C++ package, CLI 11.

10:18 It was just a first package I actually wrote and then maintained.

10:22 And it's actually in C++.

10:24 And that was written for Goofit, but now it's a fairly, I think it's done pretty well on its own.

10:30 Nice.

10:31 What's that one do?

10:32 Microsoft Terminal uses it.

10:32 Yeah.

10:33 Microsoft Terminal uses it?

10:35 Mm-hmm.

10:35 Oh, nice.

10:36 Yeah, I'm a big fan of Microsoft Terminal.

10:37 I've for a while now kind of shied away from working on Windows because the terminal experience

10:44 has been really crummy.

10:45 You know, the cmd.exe command prompt style is just like, oh, why is it so painful?

10:50 And people who work in that all day, they might not see it as painful.

10:53 But if you get to work in something like a macOS terminal or even to not quite the same degree,

10:59 but still in like a Linux one, then all of a sudden, yeah, it kind of gets there.

11:02 But I'm kind of warming up to it again Windows Terminal.

11:06 Yeah, the Xterm is one of the reasons I use, I really moved to Mac because I loved Xterm.

11:12 And then Windows Terminal is amazing.

11:14 Now it's a great, great team working on it, including the fact that they used my parser.

11:18 But it's actually quite nice.

11:22 The only problem I have in Windows 10 is it's really hard to get the thing

11:24 to show up instead of cmd prompt.

11:28 Yeah.

11:28 But Windows 11, I think it's supposed to be the only one.

11:31 Yeah.

11:31 I definitely think it's included now, which is great.

11:34 So CLI 11, this is a C++ 11 command line parser, right?

11:39 Like click or arg parse or something like that, but for C++, right?

11:42 Yes.

11:43 It was designed off of the Pumbum command line parser.

11:46 Pumbum is sort of a toolkit and it has several different things.

11:48 I wish those things had been pulled out because I think on their own, they might have

11:51 maybe even been popular on their own.

11:55 it has a really nice parser, but it was sort of designed off of that and off click.

11:58 It has some similarities to the both of those.

12:01 Yeah.

12:02 I think probably that's a challenge.

12:04 I mean, we're going to get into CiteGithub with a whole bunch of these different packages,

12:08 but finding the right granularity of what is a self-contained unit that you want to share with people

12:14 or versus things like pulling out a command line parser rather than some other library, right?

12:19 This is a careful balance.

12:21 It's a bit challenging.

12:23 I think in Python, there's a really strong emphasis to having the individual

12:28 separate pieces and packages, especially in Python, partially because it has

12:32 a really good packaging system.

12:34 And being able to take things, have just pieces and be able to swap out one

12:39 that you don't like is really, really nice.

12:41 And that's one of the things we'll talk about the PyPA as well.

12:44 And that's one of the things that they focus on is small individual packages

12:48 that each do a job versus all-in-one poetry.

12:51 Yeah.

12:52 Well, you'll have to do some checking or some fact-checking, balancing, modernizing for me.

12:58 I did professional C++ development for a couple of years and I really enjoyed it

13:03 until there were better options.

13:05 And then I'm like, why am I still doing this?

13:07 I would go work on those.

13:09 But one of the things that struck me as a big difference to that world is basically

13:15 the number of libraries you use, the granularity of the libraries you use,

13:19 you know, the relative acceptance of things like pip and the ease of using another library,

13:25 right?

13:25 In C++, you've got the header and you've got the linked file and you've got the DLL

13:31 and there's like all sorts of stuff that can like get out of sync and go crazy

13:35 and like make weird crashes.

13:37 Your app just goes away and that's not great.

13:39 Is that still true?

13:40 I feel like that that difference is one of the things that allows for people

13:44 to make these smaller composable pieces in Python.

13:47 I think that has a lot to do with it.

13:49 What has happened in C++ is there's sort of a rise of a lot of header-only libraries

13:54 and these libraries are a lot easier to just drop into your project because all you do

14:00 is you put in the headers and there's no, you don't have to deal with a lot of the

14:05 original issues.

14:07 So a lot of these small standalone libraries are header-only and one of the next

14:11 things that I picked up as a maintainer was Pybind 11, which, and I've sort of

14:17 been in that space sort of between C++ and Python for quite a bit.

14:21 I kind of like being in that area, joining the two.

14:26 I get a sense from listening to the things that you've worked on previously

14:29 and things like this that you're interested in connecting and enabling, like piecing together,

14:34 like here's my script that's going to pull together the compute on this cluster

14:38 or here's this library that pulls together Python and C++ and so on.

14:41 Yes, making different things work together and combining things like C++ and Python

14:46 or combining different packages in Python and piecing together a solution.

14:49 I think that's one of Python's strengths versus something like MATLAB.

14:53 I spent quite a bit of time in MATLAB early on and got to move a lot of stuff

14:57 over to Python.

14:58 Right on, that's awesome.

14:59 It was really nice.

15:00 We didn't have to have a license and things like that.

15:02 I know, it's so expensive and then you get the, what are they called, toolkits,

15:07 the add-on toolkits and they're like, each toolkit is the price of another $1,000 a year

15:12 or $2,000 a year.

15:13 It's ridiculous.

15:14 So I know of CFFI, which is a way for Python and C to get clicked together

15:21 in a simple way.

15:23 How's Pybind 11 fit into that?

15:27 This is seamless interoperability between C++11 and Python.

15:30 How are they different?

15:32 So CFFI, I teach like a little short course where I can go through some of the different

15:38 binding tools and it usually ends with me saying Pybind 11 is my favorite.

15:41 Yeah, cool.

15:43 Give us an overview of what the options are and stuff.

15:45 CFFI is closer to C types.

15:47 It's more of, it's focused on C versus C++ and it's actually the one I've used

15:53 the least.

15:54 I was just helping, we're just talking with the CFFI developer but I've used it

15:59 the least of those but I think it basically parses your C headers and then sort of

16:06 automates a lot of what you would have to manually do with C types or you have to

16:09 specify what symbol you want to call and what the arguments are and what the return

16:14 type is and if one of those things is wrong you get a seg fault and that sort of thing.

16:17 Whereas Pybind 11, this is about building modules, extension modules.

16:21 So, and it's, and it's, the interesting thing about this is that it's written

16:26 in pure C++.

16:26 The other tools out there, so Cython can do this, it's not what it was designed for

16:31 but it immediately became popular for doing this because Cython turned code,

16:37 Python, Python-like code is a new language into, it transpiled it into C

16:42 or C++ that had a toggle you could change, has a toggle you can change and then

16:47 when you're there you can now call C or C++ but it's extremely verbose and you repeat yourself

16:53 and you have to learn another language.

16:54 This weird combined Python thing and just thinking in Cython is difficult

16:58 because you have to think about well am I in Python or am I in Cython that can,

17:03 that's going to be bound to Python or am I in Cython that's just going straight to C,

17:07 C++ or am I just in C++ or C but I've actually used it.

17:11 It's a lot of layers there, yeah.

17:12 But Pybind 11 is just C++ and it's just, it's basically, it's like the, C API

17:19 for Python but a C++ API.

17:21 It's quite, it's quite natural and you don't have to learn a new language.

17:25 It uses some fairly advanced C++ but that's it.

17:28 You're learning something useful anyway.

17:29 Right.

17:29 So do you do some sort of like template type thing and then say I'm going to expose

17:34 this class to Python or something like that and then it figures out, does it write

17:38 the Python code or what is it?

17:40 It's writing the, build like .so files or what do you do here?

17:45 It, it compiles into the C API calls and then that would compile into a .so

17:50 and there's no separate step like Cython or Swig or these, or these other tools

17:55 because it's just C++.

17:56 You compile it like you do any other C++ but it's actually internally using

18:01 the CPython API or PyPy's wrapper for it and the language looks a lot like Python

18:06 but the names are similar.

18:07 You just do a def to define a function and you give it the name and then you just

18:11 pass it the pointer to the, the underlying thing.

18:14 It can figure out things like types and stuff like that for you.

18:16 Give it a doc string if you want.

18:18 Give the arguments names.

18:19 You can make it as Pythonic as you want.

18:20 It's verbose but it's not overly verbose.

18:23 Yeah, that's really neat.

18:25 Nice.

18:25 And for people who haven't used those kind of outputs, basically, it's just import

18:30 module name whether it's a .py file or it's a .so file.

18:36 PyTorch if you've used .py if you've used any of those things you have, you've been importing

18:43 some PyBind11 code.

18:44 So let's talk a little bit about Scikit-Hep.

18:48 This is one of the projects that it has a lot of these packages inside of it

18:54 and your library CBuild Wheel is one of the things that is used to maintain

19:03 and build a lot of those packages because I'm sure they have a lot of interesting

19:06 and oddball dependencies, right?

19:08 I mean, C++ is kind of standard, but there's probably others as well, right?

19:12 It is.

19:14 So one thing that is kind of somewhat unique to HEP is that we are very heavily invested

19:19 in C++.

19:19 So it's usually either you're going to see Python or you're going to see some sort

19:23 of C++ package of some sort.

19:25 I mean, it could be varies in size there, but it's mostly C++ or Python.

19:30 We really haven't used other languages much for the past early 90s or so.

19:35 Is that inertia or is that by choice?

19:39 You know, why is that?

19:40 I think it's partially the community is a fairly cohesive community.

19:46 We're really used to sort of working together.

19:48 The experiments themselves are often, you know, might be a thousand or several thousand

19:54 physicists working on a single experiment.

19:55 And we have been fairly good about sort of meeting together and sort of deciding

20:01 the direction that we want to go in and it's sort of sticking to that.

20:04 So for C++, it was heavily root, which is a giant C++ framework.

20:11 And it's got everything in it.

20:13 And that was C++ and that's what everybody used.

20:15 So root is the library.

20:17 If I was going to write code that would run and interact with like the grid computing

20:23 or the data access and all that kind of stuff at LHC, I would use this root library

20:28 if I was doing that in C++, right?

20:30 Yes.

20:30 You might be using interpreted C++, which is something we invented.

20:33 Oh, okay.

20:35 This is interesting.

20:37 Is this something people can use?

20:38 Oh, yes.

20:39 We actually, so Cint was the original interpreter and then it got replaced by

20:44 Cling, which is built on the LLVM.

20:47 And I think recently it was merged to mainline LLVM as Clang Repl, I think it's

20:54 called, but it's sort of a lightweight version.

20:56 Yeah, it's a C++ interpreter.

20:59 You can actually get Zeus Cling, which I think Quantstack has, but they package

21:06 it as well, I think, Zeus Cling.

21:08 Okay, yeah, very interesting.

21:09 It's not, C++ really wasn't designed for a notebook though.

21:13 It does work, but you can't rerun a cell often because you can't redefine things.

21:18 Python is just really natural in a notebook and C++ is not.

21:21 Yeah, especially if you change the type, you compile it as an int and then you're

21:25 like, ah, that should be a string.

21:26 Yeah, that's not going to be a string.

21:27 It's compiled.

21:28 Yeah, interesting.

21:29 So it seems to me like the community at CERN has decided, look, we need some

21:34 low-level stuff and there's some crazy low-level things that happen over there.

21:38 People can check out a video, maybe I'll mention a little bit later.

21:41 But for that use, they've sort of gravitated towards C and then for the other aspects,

21:47 it sounds like Python is what everyone agreed to.

21:50 It's like, hey, we want to visualize this, we want to do some notebook stuff, we want

21:53 to piece things together, something like that, right?

21:56 It's certainly moving that way.

21:58 They definitely have sort of agreed that Python should be a first-class language and

22:05 join C++.

22:05 That was decided a few years ago.

22:07 And I think that's been a great step in the right direction because what was

22:11 happening, people were coming in with Python knowledge.

22:14 They wanted to use Pandas.

22:15 I came in that way as well.

22:17 Pandas and Numba and all these tools were really, really nice.

22:21 And we were basically just having to write them all ourselves in C++.

22:25 It has a data frame, but why not just use Python, which is what people know

22:31 anyway?

22:32 Panda exists.

22:33 There's a ton of people already doing the work maintaining it for us.

22:36 Root literally has a string class.

22:39 Literally, they do everything.

22:42 So the idea, and this is sort of the idea behind Scikit-HEP was to build

22:47 this collection of packages that would just fill in the missing pieces, the things that

22:53 energy physicists were used to and needed.

22:55 And some of them are general and were just gaps in the data science ecosystem,

23:00 and some things are very specific, high energy physics.

23:03 Scikit-HEP actually sort of originated as a single package.

23:08 It sort of looked like root red at first, and it was invented by someone called

23:15 Eduardo Rodriguez, who was actually in my office at CERN, and we're office mates.

23:19 But he did something I think really brilliant when he did this, and that is

23:23 he created an organization called Scikit-HEP around it, and then he went out and

23:27 spoke with people and got some of the other Python packages that existed

23:30 at the time to join Scikit-HEP, moved them over and started building a collection of

23:35 some of the most popular Python packages at the time.

23:38 And I thought that was great, and I really wanted Scikit-HEP to become a

23:43 collection of tools, separate tools, and for the Scikit-HEP package to just

23:47 be sort of a meta package that just grabbed all the rest.

23:50 And that's actually kind of where it is now.

23:51 Right.

23:52 I can pip install Scikit-HEP.

23:53 Is that right?

23:54 You can, and mostly, other than a few little things that are still in there

23:57 that never got pulled out, that will mostly just install our most popular,

24:01 maybe 15 or so packages, 2015 of our most popular packages.

24:06 Yeah, so it probably doesn't really do anything other than, say, it depends

24:10 on those packages or something like that, right?

24:12 And then by virtue of installing it, it'll grab all the pieces.

24:15 Yeah, yeah, that's a really cool idea and I like it.

24:18 So maybe one of the things I thought would be fun is to go through some of

24:22 the packages there to give people a sense of what's in here.

24:25 Some of these are pretty particular and I don't think would find broad use outside of

24:30 CERN.

24:31 For example, Conda Forge Root.

24:33 It sounds like that's about building root so I can install it as a dependency

24:37 or something like that, right?

24:39 building root is horrible and you actually now can get it as part of a Conda package

24:45 which is just way better than anything that was available for attaching it to a

24:49 specific version of Python because it has to compile against a very specific version of

24:54 Python but that's what it does.

24:56 So unless you want something in root then that's very HEP specific.

25:00 Yeah, absolutely.

25:01 Some more general ones, probably our first, briefly Mitch, our very first package that I

25:07 think was really popular among energy physicists that we actually produced was

25:13 Uproot which was just a pure Python package so you didn't have to install it that

25:19 read root files.

25:19 Again, very specific for somebody who was in high energy physics but you

25:25 could actually read a root file and get your data without installing root and

25:29 that was a game changer.

25:31 So now you can actually install root slightly easier but normally it's a

25:35 multi-hour compile and it's got.

25:38 gotten better but it's still a bit of a beast to compile especially for Python.

25:40 Yeah, that does sound like a beast.

25:41 Oh my gosh.

25:42 And now you can just read in your files.

25:44 Basically, Jim Povarsky basically just taught Python to understand the decompile the root

25:50 file structure and actually can write right now too but originally reading.

25:54 But that actually was really...

25:56 So this is like if I want to do, if I want to create a notebook and maybe

25:59 visualize some of the data but I don't really need access to anything else, I shouldn't

26:03 depend on this beast of almost its own operating system type of thing.

26:08 Yeah, we were very close to being able to use all the data science tools in

26:12 Python, pandas, things like that.

26:13 For most data worked fine.

26:15 You just had to get the data.

26:17 And I mean, I've done this too where I had one special install of Python and

26:23 root together that I'd worked several hours on and it sat somewhere and I would convert

26:27 data with it.

26:27 I'd move it to HDF5 and then I would do all the rest of the analysis in Python that

26:32 didn't have it because then I could do virtual environments and all that read that HDF5

26:36 format, right?

26:37 Mm-hmm.

26:37 Yeah.

26:38 Right, okay.

26:39 The first package we had that was really popular on its own was Awkward Array.

26:44 Yeah.

26:44 Awkward Arrays.

26:45 I definitely heard about this one, yeah.

26:47 Yeah, that was originally part of Upproot, sort of grew out of Upproot.

26:51 When you're reading root files, you end up with these jagged arrays.

26:55 So that's an array that is not rectangular.

26:58 So at least one dimension is jagged.

27:01 it depends on the data and this shows up in all sorts of places and not just particle

27:07 collisions or obviously shows up lots of places in particle collisions like how many hits

27:10 got triggered in the detector.

27:12 That's a variable length list.

27:13 How many tracks are in an event?

27:14 You know, that's a variable length list and can be a variable length list of

27:18 structured data.

27:18 And to store that compactly the same way you'd use numpy was one thing, but you can

27:25 arrow and there's some other, there's some other things that do this, but Awkward Array

27:28 also gives you numpy like indexing and data manipulation.

27:33 And that was the sort of the breakthrough thing here.

27:36 It's like numpy.

27:38 The original one was built on top of numpy.

27:40 The new one actually has some pybind11 compiled bits and pieces, but it makes working

27:47 with that really well.

27:47 In fact, Jim Povarsky has now got a grant to expand this to, I don't remember the

27:53 number of different disciplines that he's working with, but lots of different areas,

27:57 genomics and things like that have all use cases and he's adding things like

28:02 complex numbers and things that weren't originally needed by energy physicists, but

28:05 make it widely.

28:07 Almost an evangelism, like dev evangelism type of role, right?

28:11 Go talk to the other groups and say, hey, we think you should be using this.

28:16 What is it missing for you to really love it?

28:18 Something like that, right?

28:19 How interesting.

28:20 Yeah.

28:20 So, yeah.

28:22 Yeah.

28:22 So looking at the Awkward Array page here says for a similar problem, 10 million times

28:28 larger than this example given above, which one above is not totally simple.

28:32 So that's pretty crazy.

28:33 It says Awkward Array, the one liner takes 4.6 seconds to run and uses 2 gigs of

28:40 memory.

28:40 The equivalent Python list in dictionaries takes over two minutes and uses

28:45 10 times as much memory, 22 gigs.

28:47 So, yeah, that's a pretty appealing value proposition there.

28:50 Yeah.

28:51 And it supports Numba.

28:53 Jim works very closely with the Numba team and really is one of the experts on the

28:58 Numba internals.

28:59 So, yeah, it has full Numba support now and he's working on adding Dask.

29:04 He's working with Anaconda on this grant and then working with adding GPU support.

29:09 Very cool.

29:10 Maybe not everyone out there knows what Numba is.

29:12 Maybe give us a quick elevator pitch on Numba.

29:15 Yeah.

29:15 I hear it makes Python code fast, right?

29:18 Yeah, it's a just-in-time compiler and it takes Python.

29:23 It takes Python.

29:24 It actually takes the bytecode and then it basically takes that back to something or it

29:31 parses the bytes code and turns it into LLVM.

29:34 So it works a lot like Julia except instead of a new language, it's actually reading Python

29:39 bytecode, which is challenging because the Python bytecode is not something that stays

29:44 static or is supposed to be a public detail.

29:47 Yeah, there's no public promises about consistency of bytecode across versions because

29:54 they play with that all the time to try to speed up things and they add bytecodes and

29:58 they try to do little optimizations.

30:00 Yeah, so every Python release breaks Numba.

30:02 So they have to, they just know the next Python release will not support Numba and it

30:06 usually takes a month or two.

30:07 But it's very impressive though.

30:11 It's the speedups, you do get full sort of C type speedups for something that looks

30:16 just like Python.

30:16 It compiles really fast for a small problem and it's as fast as anything

30:22 else you can do.

30:23 I've tried lots of these various programming problems and you just about can't

30:29 beat Numba.

30:29 It actually knows what your architecture is since it's just in time compiling.

30:33 So you have to do which is an advantage over say like C, right?

30:37 It can look exactly at what your platform is and your machine architecture

30:41 and say we're going to target, you know, I see your CPU supports this special vectorized thing

30:46 or whatever and it's going to build that in, right?

30:47 and then what sort of Jim does with Awkward and we've done with some other things with Vector

30:51 does this too.

30:52 You can control what Python turns into what LLVM constructs any Python turns into because

31:00 you can control that compile phase.

31:02 That's incredibly powerful because you can say and it doesn't have to be the same thing but

31:06 obviously you want it to behave the same way.

31:08 You can say if you see this structure, this is what it turns into.

31:12 in LLVM machine code which then gets compiled or machine language which then gets compiled

31:19 into your native machine language.

31:21 Interesting.

31:21 Assembling.

31:22 So if you have like a certain data structure that you know can be well represented or gets

31:27 packed up in a certain way to be super efficient you can control that?

31:30 Yeah, you can say that well this like this operation on this data structure, this is what

31:35 this is what it should do and then that turns into LLVM and maybe it can

31:38 get vectorized or things like that for you.

31:41 Yeah, yeah.

31:42 That's super neat.

31:43 Another package in the list that I got to talk about because just the name and the graphic is

31:47 fantastic is a gassed.

31:49 What is a gassed?

31:51 It's got like this the scream.

31:53 I forgot who was the artist of that but the scream sort of look as part of the logo is good.

31:59 About half of the logos come from Jim and I did about half and he did about half and then

32:04 use other around or from the individual package authors.

32:07 A gassed was so this is sort of part of the histogramming area which is where sort of the

32:12 area I work in, psychic hub.

32:13 But Jim actually wrote a gassed and the idea was that it would convert between

32:17 histogram representations.

32:19 I think it came up because Jim got tired of writing histogram libraries.

32:22 I think he's written at least five.

32:24 Yeah, one of the things I got the sense of by looking through all the psychic hub stuff,

32:29 there's a lot of histogram stuff happening over there.

32:32 histograms are sort of the area that I was in and it ended up coming in in

32:36 several pieces.

32:37 But I think one of the important things was actually, and I think a gassed may not really

32:41 matter, it may get archived at some point because instead of translating between

32:47 different representations of histograms in memory, what you can do is define

32:52 a static typing protocol and it can be checked by mypy that describes what a object needs

33:01 to be called a histogram.

33:01 And so I've defined that as a package called UHI, Universal Histogram Interface.

33:06 And anything that implements UHI, it can be fully checked by mypy, will then be able

33:11 to take any object from any library that implements UHI.

33:17 And so all the libraries we have that produce histograms, so uproot, when it reads a root

33:22 histogram or hist and boost histogram, when they produce histograms, they

33:26 don't need to depend on each other.

33:28 They don't even depend on UHI, that's just a static dependency for mypy time.

33:32 And then they can be plotted in MPLHEP or they can be printed to the terminal with

33:39 histoprint and there's no dependencies there.

33:42 One doesn't need the other.

33:43 And that's sort of making a gassed somewhat unneeded because now it really doesn't matter.

33:48 You don't have to convert between two because they both just work.

33:51 They work on the same underlying structure basically, right?

33:54 They work through the same interface.

33:57 Right.

33:58 Yeah.

33:58 So a gassed is a way to work with different histogramming libraries that kind of is the

34:04 intermediary of that.

34:06 It's like an abstraction layer on that.

34:08 Okay.

34:08 What are some other ones?

34:10 Yeah.

34:11 What are some other ones we should kind of give a shout out to?

34:14 we talked about GUFIT, which is there.

34:16 It's an affiliated package.

34:17 It's not part of scikit-hep, but it has.

34:19 So we developed this idea of an affiliated package for sure things that didn't need to be moved

34:24 in, but had at least one scikit-hep developer working with them.

34:30 At least that's my definition.

34:31 I was never able to actually get the rest to agree to exactly that definition, but that's my

34:36 working definition.

34:37 And so that's why pybind11 gets listed there.

34:39 It's an affiliated package because we share a developer, me, with the pybind11 library.

34:45 And we sort of have a say in how that is developed.

34:50 And most importantly, if we have somebody come into scikit-hep, we want them to use pybind11

34:54 over the other tools because that one we have a lot of experience with.

34:58 Very cool.

34:59 Another one I thought was interesting is hep units.

35:02 So this idea of representing units like the standard units, they're not enough for

35:08 us.

35:08 We have our own kind of things like molarity and stuff, but also luminosity

35:14 and other stuff, right?

35:16 Yeah.

35:16 Different experience service can differ a bit.

35:20 So there's a sort of a standard that got built up for units.

35:23 And so this just sort of puts that together and has, and the unit that we've sort of decided on

35:30 this should be the standard unit that's one and the rest are different scalers.

35:33 It's a very tiny little library.

35:35 It was the first one to be fully statically typed because it was tiny.

35:38 That's easy to do.

35:39 It was like, because mypy infers constants, there was like two functions or something

35:43 and then it was done.

35:44 Yeah.

35:45 Probably a lot of floats.

35:46 Mm-hmm.

35:47 So, but, and that's, that's sort of what it is.

35:50 So you can use that if, and the idea is that the rest of the libraries will, will adhere to

35:55 that system of units.

35:57 So then if you use this and then use that, the values it gives you, then you can have a nice

36:02 human, human readable units and be sure of your units.

36:05 Yeah.

36:05 That's really neat.

36:06 Have you heard of pint?

36:08 Are you familiar with this one?

36:09 Yes, I love pint.

36:10 Oh gosh, I think pint is interesting as well.

36:13 It takes the, the types through and I use pint some, but it actually gives you a quantity out

36:19 or a numpy quantity.

36:21 Whereas the happiness just stays out of the way and it's a way to be more

36:26 clear in your code, but it's not enforced.

36:27 Pint is enforced, which I like enforcing, but it's also can slow down.

36:31 You can't, these are not actual real numbers anymore.

36:34 So you pay it.

36:35 Yeah.

36:35 So it's going to add a ton of overhead, right?

36:36 But pint's interesting because you can do things like three times meter plus four times centimeter

36:41 and you end up with 3.04 meters.

36:44 Yeah.

36:45 Those are actually real quantities.

36:46 They're actually a different object, which is the good thing about it, but it's

36:50 also the reason that then it's not going to talk to say a C library that

36:52 expects a regular number or something as well.

36:55 Sure.

36:55 Okay.

36:56 Maybe one or two more and then we'll probably be out of time for these.

37:00 What else should people maybe pay attention to that they could generally find

37:03 useful over here?

37:04 You mentioned vector.

37:05 It's a little bit newer, but it's certainly for general physics.

37:08 I think it's useful because it's a library for 2D, 3D and relativistic vectors.

37:15 And there aren't really, it's a very common sort of learning example you see,

37:20 but there aren't really very many libraries that do this, that actually have,

37:23 if you want to take the magnitude of a vector in 3D space, there just isn't a

37:28 nice library for that.

37:29 So we wrote vector to do that.

37:31 And vectors is supported by awkward.

37:34 It has an awkward backend.

37:35 It has a number backend, a numpy backend, and then plain object backend.

37:39 Eventually we might work on more.

37:41 And it even has a number awkward.

37:42 So you can, you can use a, a vector inside an awkward array inside a number jet

37:48 compiled loop and still take magnitudes and do stuff like that.

37:51 That's really cool.

37:52 That integration there.

37:53 Yeah.

37:54 vectors because we have a lot of those in physics.

37:56 Sure.

37:56 And you can, you can do things like ask if one vector is close to another vector and

38:01 things like that, even in different, it looks like a one in polar coordinates and

38:05 one in, you know, a Cartesian or something like that.

38:08 It has different unit systems and it can actually, it actually stores the vector in

38:12 that.

38:13 So you don't waste memory or something.

38:15 If that's, that's the representation you have, that was a feature from a route

38:18 that we wanted to make sure we, we got.

38:20 And it also has sort of the idea of, of momentums too and stuff for the, for the

38:23 relativistic stuff.

38:24 We end up with a lot of that.

38:26 And then maybe just mention the, since we mentioned the histogramming stuff and that's

38:29 the area, that's the ones that I really work on.

38:31 The ones I specifically work on that are general purpose.

38:34 Boost histogram is a wrapper for the C++ boost histogram library.

38:38 Boost is the sort of the big C++ library, just one step below the standard library.

38:45 And right at the time I was starting at, at Princeton, the, I met the author of boost

38:50 histogramming who's from physics and he was in the process, I believe, of getting

38:55 this accepted into boost.

38:57 And it got accepted after that.

38:58 But one of the things that he decided to do is pull out his initial Python bindings that

39:03 were written in boost Python, which is actually very similar to pybind11, but requires boost

39:10 instead of not requiring anything.

39:11 But the design is intentionally very similar.

39:13 And so I proposed I would, I would work on boost histogram and write these, this, the

39:20 Python bindings for it inside scikit-hep.

39:22 And that would be sort of the main project I started on when I started at Princeton.

39:26 And that's, you know, that's what I did.

39:29 Boost histogram is a extremely powerful histogramming library.

39:32 So it's a histogram as an object rather than like a NumPy, you can, there's a

39:36 histogram function and you give it an array and then it spits a couple of arrays back out at

39:40 you.

39:40 You are now, you now have to manage these.

39:44 They don't have any special meaning.

39:45 Whereas boost histogram, histograms really are much more natural as an object, just like a

39:49 data frame is more natural as an object where you tie that information together.

39:52 A histogram's really natural that way, where you still have the information about what the data

39:56 actually was on the axes.

39:58 If you have labels, you want to keep those attached to those, to that, to that data.

40:03 And you may need to fill again, which is one of the main things that, energy physicists really wanted

40:07 because we tend to fill histograms and then keep filling them or rebinning them

40:11 or doing operations on them.

40:12 And you can do all those very naturally.

40:14 And boost histograms, the actual, the C++ wrapper in PyBind 11 and a lot of, and,

40:20 I actually got involved in CI BuildWell because of boost histogram, because I, one of the

40:24 things I wanted to just make sure it worked everywhere.

40:26 And it obviously requires C++.

40:27 It requires compilation.

40:30 and then hist is a nice wrapper on top of that that just makes it a lot more friendly to,

40:33 to use because the original boost histogram author wants to keep this, Hans Dubinsky wants

40:38 to keep this quite, pure and clean.

40:40 So hist is a more, the more natural.

40:43 And even if you're not in hep, I think that's still the more natural one to use.

40:45 Yeah.

40:46 Gold.plot and plot.

40:48 Right, right.

40:48 There's a lot of people who do, who use histograms across all sorts of disciplines.

40:52 So that would definitely be one of those that is generally useful.

40:56 All right.

40:56 So I think that brings us to CI build wheel.

40:59 let's, let's talk a bit about that.

41:02 And I mean, maybe the place to start here is, you want to wheels, right?

41:06 The, the first sentence describing it as Python wheels are great building them across

41:10 Mac, Linux, windows, and other multiple versions of Python.

41:13 Not so much.

41:14 So no description.

41:15 Yeah, exactly.

41:17 Well, wheels are good.

41:19 There's times when there are no wheels and things install slower.

41:23 They might not install at all.

41:24 It's generally a bad thing if you don't have a wheel, but, they're, they're not easy

41:29 to make.

41:29 Right.

41:29 So tell us what is a wheel and then let's talk about why maybe building them across all these

41:33 platforms and this, cross product along with like versions of Python and whatnot.

41:38 It's a mess.

41:39 When you distribute Python, you have several options.

41:41 The most common one and most packages have at least an S dist, which is just basically a

41:46 tar ball of the, of the source.

41:48 Right.

41:49 When you pip install it, it basically, you're missing some things or adding some

41:52 things, but right.

41:53 Otherwise it's mostly unzips.

41:54 Yeah.

41:54 It unzips your source and puts it somewhere.

41:56 Python will find it.

41:57 And then that's that.

41:58 Yeah.

41:58 So it runs your build system.

42:00 So set up tools traditionally that's become a lot more powerful recently, but, it

42:04 has to run the build system to figure out what, what do you do with it?

42:07 This is just a bunch of files.

42:08 and then it puts it together in a particular structure in your, in your, on your computer.

42:14 And so a wheel was a package that was already, everything was already in place.

42:19 So it's already in a particular structure.

42:21 It knows, knows the structure and all Python has to do for a pure Python wheel, one that does not

42:26 have any, binary pieces in it.

42:30 It just grabs the contents inside and dumps them following a specific, set of rules into places into your,

42:38 site packages.

42:39 Right.

42:39 So then you now have something installed.

42:40 There's no setup.py in your wheel.

42:42 There's no pyproject.tomol.

42:45 There's those sorts of things are not in the wheel.

42:47 The wheel's already there.

42:48 It can't run arbitrary code.

42:51 Yeah, exactly.

42:51 That was one of the points I was going to make is one of the things that can be scary about

42:55 installing packages is just by virtue of installing them, you're running arbitrary code because

43:01 often that is execute, you know, Python space, set up py space, you know, install or something

43:08 like that.

43:08 And like, whatever that thing does, that's what happens when you pip install.

43:11 Right.

43:12 But not with wheels, as you said, it comes down in a binary blob and just like, boom, here it is.

43:16 Obviously the thinking is we have this package delivered to a million computers.

43:21 Why do we need to have every million computer run all the steps?

43:24 Why don't we just run it once and then go here?

43:26 And then also that saves you a ton of time.

43:28 Right.

43:29 like I just installed micro whiskey and it took, I don't know, 30 seconds, 45 seconds to

43:34 install because it didn't have a wheel.

43:36 So it sat there and it just grinded away compiling it, you know?

43:39 Yeah.

43:40 So there's two possibilities.

43:41 A pure Python package, a wheel is still superior because of the not running arbitrary code.

43:47 pip will actually go ahead and compile all your PYC files.

43:51 Your, that goes ahead and makes the bytecode for all those.

43:55 If it's a wheel, if it's an S, if it's a tarball, it doesn't do that.

43:58 If it doesn't pass through the wheel stage anyway.

44:00 And then when, every time you open the file, then it's going to have the first time it's going to have

44:05 to make that, that bytecode.

44:06 So it'll be a little slower the first time you open it.

44:08 There's, there's a variety of, of reasons.

44:10 I think it's pythonwheels.com, something like that.

44:13 That describes why you should use wheels.

44:16 That's maybe that's not it, but I think it is.

44:18 Yes.

44:19 Python wheels.

44:19 So they have like a list of advantages there, but.

44:23 Yeah.

44:23 I also have a little like checklist.

44:25 It says, how are we doing for the top 360 packages?

44:30 And apparently 342 of them have wheels.

44:32 And it shows you for your popular packages, which ones like click does, but future doesn't, for example, and so

44:38 on.

44:39 So.

44:39 Future's been there for a long time.

44:41 Yeah.

44:42 But, but yeah, so wheels are really good.

44:45 And they actually replaced an older mechanism that was trying to do something somewhat similar called

44:50 eggs, but I avoid talking about those.

44:53 I don't really understand.

44:54 Let it live in the past.

44:55 Let it live in the past.

44:56 Wheels also are a great way if you have compile and compile that happens.

45:01 So if you compile some code as part of your, as part of your build, then

45:07 that of course is much slower.

45:08 If you have the, if you just have the example.

45:11 Yeah.

45:11 It's like it was doing GCC or something forever.

45:13 And if you don't have a compiler, it won't even work.

45:15 Right.

45:15 Exactly.

45:15 You have to have some setup, at least a little setup.

45:18 You have to have a compiler setup at the very moment.

45:20 Right.

45:20 How many windows users have seen cannot find vcvars.bat?

45:24 Right.

45:25 Like what is this?

45:26 I don't want this.

45:26 In windows you have to be in a, in the environment or you have to have the, the right script sourced.

45:30 Yes.

45:30 So wheels had also can contain binary components like .so's and things.

45:37 And they have a tag as part of their name.

45:39 They have a very special naming scheme for wheels and the tag is stored in

45:43 the wheel too.

45:44 And they can tell you what Python version they're good for, what platform they can,

45:51 are supported on.

45:52 They have a build number and then they have a, the Python's actually in two

45:57 pieces.

45:57 There's the, the AVI and the interface.

46:00 Yeah.

46:01 You can see there's some huge long name that with a bunch of underscores

46:04 separating it.

46:05 And basically, when you try to install it, sorry, go ahead.

46:09 I was saying it's also one of the reasons that names are normalized.

46:11 There's no difference between a dash and underscore.

46:13 It's because that special wheel name has dashes in it.

46:16 So the package name at that point in the, in the file name has to be underscores.

46:20 Yeah.

46:21 And so basically when you pip install, it says it, it builds up that, that

46:24 name and says, do you have this as a binary?

46:27 Give it to me, right?

46:27 Something like this.

46:28 Yeah.

46:29 It knows how to pick out the, it looks for the right one.

46:31 If it finds a binary, it'll just download it depending slightly on the system and how new your pip is.

46:36 Right.

46:37 And this is one of the main innovations ideas or philosophies behind Conda and, Anaconda,

46:44 right?

46:44 It's like, let's just take that and make sure that we build all of these

46:47 things in a really clear way.

46:48 And then sort of package up the, the testing and come compilation and distribute, distributing all

46:54 that together.

46:54 Right.

46:55 Yes.

46:55 This is very similar to this.

46:57 This came, I think, I'm pretty sure it came after Conda.

46:59 I think where they were still in eggs when Conda was invented and then sort of building up wheels was

47:05 challenging.

47:05 Building a wheel was, was challenging.

47:08 That's, that's, yeah, build wheel has really changed that.

47:10 if you want a pure Python, it's really easy.

47:12 You use today, you should be using the build tool, which I'm also, I'm a

47:15 maintainer of that as well.

47:17 but build just builds an S dist for you or it builds a wheel.

47:22 And so you would say something like Python, set up PY, B dist or something like that.

47:28 And then boom, I shouldn't be doing that anymore.

47:29 Please don't.

47:30 But that is how you do it.

47:31 Yeah.

47:31 How would I do it?

47:32 Tell me the right way.

47:33 The best.

47:34 well you could do Python, or pip install build and then Python dash M build, and that will build both an

47:42 S dist and a wheel and it'll build the wheel from the S dist.

47:45 if you use pip X, which I would recommend, then you can just say pip X run build and you don't have to do

47:50 anything that'll, that'll download build into a virtual environment for you.

47:54 It'll do it.

47:54 And then it eventually it'll throw away the virtual environment, after a

47:57 week.

47:57 Interesting.

47:57 Okay.

47:58 So we could just use the build.

48:00 We should be using the build.

48:01 You should be using the build tool.

48:03 It's for an S dist.

48:04 There's a big, benefit to this.

48:06 And that is it will, it will, use your piproject.toml.

48:10 And if you say you require numpy, then it will go, like you're using the

48:15 numpy headers, the C headers, then it will go and it will, when it's, when

48:19 it's building S dist, it will make the pep, 517 virtual environment.

48:25 It'll install numpy, anything that's in your, your, your, requires in your piproject.toml.

48:30 And then it will run the setup.py, inside that environment.

48:35 So you can now import numpy directly in there.

48:37 and it'll work even when you're building an S dist.

48:40 If you do Python S dist or Python, setup.py stuff, you can't do that because

48:45 you're literally running Python, giving it setup.py import numpy.

48:49 now it's broken.

48:50 Right.

48:51 It, it, nothing, nothing triggers that, that call to the, pyproject.toml to see what, what you need, for a wheel.

49:00 The best way to do it is with pip.

49:01 or the original way to do it was with pip wheel.

49:04 because pip has to be able to build wheels in order to, install things.

49:08 The, that got added to pip before build existed.

49:13 but now the best way to do it would be with build wheel.

49:16 And that's actually, it's doing the right thing.

49:17 It's actually trying to build the wheel you want.

49:19 Whereas pip wheel is actually just building a wheelhouse.

49:23 So if you depend on numpy and numpy doesn't have wheels, which they did better with Python 3.10.

49:28 So I'm not going to complain about, about numpy for Python 3.10, but for 3.9, they didn't

49:33 have wheels for a while.

49:33 So it'll build the wheels there and it'll build your wheels and it'll dump them all in the

49:37 wheelhouse or whatever the output is.

49:39 So you'll get, you'll be building numpy wheels, which you definitely don't want to try to upload.

49:43 Yeah.

49:43 Yeah, definitely not.

49:44 All right.

49:45 Well, that's, that's really cool.

49:46 And I definitely learned something.

49:47 I will start using build instead of, doing it the other way.

49:51 You can now delete your setup.py too.

49:53 Yeah.

49:53 That's the big thing, right?

49:54 So you don't have to run that kind of stuff, right?

49:56 Yeah.

49:57 The, they're trying to move away from the any commands to setup.py because you don't

50:01 even need one anymore.

50:02 And, you can't control that, that environment.

50:04 It's, it's very much an internal detail.

50:06 Like wrapping up this segment of the conversation, we want a wheel because that's best.

50:12 It installs without requiring the compiler tools on our system.

50:15 It installs faster.

50:16 It's built just for our platform.

50:19 The challenge is when you become a maintainer, you got to solve this, this matrix of different

50:26 Python versions that are supported and different platforms.

50:28 Like for example, there's macOS Intel, there's macOS, M1, Apple Silicon.

50:33 There's multiple versions of windows.

50:35 There's different versions of Linux, right?

50:38 Like arm Linux versus AMD 64 Linux.

50:41 Yeah.

50:42 And now muscle, muscle Linux versus the other Linux varieties.

50:45 Yeah.

50:46 So one of the challenges with a wheel, is making it distributable.

50:51 So if you just go out and you build a wheel and then you try to give it to someone else,

50:54 it may not work.

50:54 certainly on, Linux, if you try to pretty much, if you do that, that it just won't

51:00 work.

51:00 because the systems are going to be different.

51:02 on macOS, it'll only work on the version you compiled it on and not anything, older.

51:07 And, you'll even see people trying to compile on, on macOS 10.14 because they're,

51:13 they want their wheels to work as in many places as you want.

51:16 Well, you can use the latest one.

51:17 There's ways to fix that.

51:19 Well, exactly.

51:20 It's fine.

51:20 The jankiest, like I've got a Mac mini from nine, from 2009.

51:24 We're building on that thing.

51:25 Cause it will work for most people.

51:27 Right.

51:27 I think that's how they actually build the official Python binaries.

51:30 Interesting.

51:31 I'm not sure.

51:32 But then Apple went in like last year around this time, they threw a big spanner in the

51:37 works and said, you know what?

51:38 We're going to completely switch to arm and our own silicon.

51:41 And, you got to compile for something different now.

51:43 Yeah.

51:44 And cross compiling has always been a challenge.

51:46 yeah.

51:47 And then windows is actually the easiest of all of them.

51:49 You're most likely on windows to be able to compile something that you can give to someone

51:53 else.

51:53 But, the rest of the, that is one of the things that Microsoft's been really pretty good

51:57 at is backwards compatibility.

51:58 I get holds them back in other ways, but yeah, typically you can run an app from 20 years ago

52:03 and it'll still run.

52:04 Yeah.

52:04 And there are a few caveats, but not, not many compared, at least compared to the other

52:09 systems.

52:09 Apple's really good, but you do have to, you do have to understand how to, you do have

52:14 to set your minimum version and you have to get a Python that had that minimum version set when it was compiled.

52:19 But if you do that, it works really well.

52:21 So what I actually did with what actually started with in scikit-heb, I had this, I had, I was

52:26 building boost histogram, which needed to be able to run anywhere.

52:29 That was something I absolutely wanted.

52:30 It had to be pip install boost histogram and it just worked no matter what.

52:33 And also we had several other compiled packages at the time, several we had inherited.

52:37 and, I mean, you is, was compiled and that was quite popular.

52:41 We had a couple of specific ones and we had a couple, a couple more that ended up being,

52:45 becoming interested in, in that.

52:47 fact during this sort of period is when awkward started compiling pieces.

52:51 And so what I started with was building my own system to do this.

52:55 It was called, Azure wheel helpers, which, was, as you can guess by the name, Azure

53:00 was basically a set of Azure, DevOps scripts.

53:04 It was right after Azure had come out.

53:05 And I wrote a series of blog posts on this and described the exact process.

53:09 and sort of the things I'd found out about how you build a compatible wheel.

53:13 on macOS, you have to make sure you get the most compatible, CPython from, from

53:20 python.org itself.

53:21 You, you know, can't use, you can't use brew or something like that because those are

53:25 going to be compiled for whatever system they were targeting.

53:27 And on Linux, you have to, you have to run the mini Linux system and you should run audit

53:33 wheel and actually in Mac, you should run develop.

53:35 develop, bill wheel, although I might be getting him, I think it's the bill wheel.

53:39 so there's this, this series of things that you have to do.

53:41 And I started maintaining this, this multi hundred line set of scripts to do this.

53:47 And, and I was also being limited by Azure at the time.

53:51 They didn't have these, all the templates and stuff they have now.

53:53 So everything had to be managed through get subtree because it couldn't be a separate

53:57 repository.

53:58 And, and I, and then when, Jim started working awkward, he went and just rewrote the

54:03 whole thing to, cause it thought it.

54:05 He wanted it to look simpler for him and took a couple of things out that were needed and

54:09 suddenly made it two separate things.

54:11 Now I had to, had to help maintain that.

54:13 So when Python 3.8 or whatever it was came out, now I had, I had a completely different

54:17 set of changes I had to make for that one.

54:18 And it was really just not, it was not working out.

54:21 it was not very easy to maintain.

54:22 And I was watching, CI build wheel.

54:25 I, and it was this package.

54:27 It was a Python package that would, would do this.

54:30 And it didn't matter what CI system you were on because it was written in Python.

54:34 and it, it followed nice, Python principles for good package design and had unit tests and all that sort of stuff.

54:40 So it looked really good.

54:41 There were a couple of things that was missing.

54:43 I came in, I added, I made PRs for the things that I'd, I'd come up with that it didn't have.

54:47 And they got accepted.

54:48 And, there was a shared maintainer between PI bind 11 and CI build wheel as well.

54:52 I think that's one of the reasons that I sort of had heard about it.

54:55 It was really watching it.

54:55 And I finally decided just to make the switch.

54:57 And, I did at some point, a little later, I actually became a maintainer of CI build wheel.

55:02 But, I think I started doing the switch before I made it really easy once I was a maintainer to say, oh, this is a package that, you know, we have some control over.

55:09 It's okay.

55:09 Let's just.

55:10 Right.

55:10 Your package is a choice to depend upon this.

55:13 Cause we have a say.

55:14 It just took out all of that, that maintenance.

55:16 And now, depend about does all the maintenance for us.

55:20 It does the pin moves for the pin and see a build wheel.

55:23 And that's it.

55:23 Nice.

55:23 So if I want to accomplish, if I'm a package developer owner, and I want to share that package with everybody,

55:32 we've already determined we would ideally want to have a wheel, but getting that wheel is hard.

55:37 So CI build wheel will let you integrate it as the name indicates into your continuous integration.

55:43 And one of those steps of CI could be build the wheel, right?

55:46 But it pretty almost, it reduces it down to pretty much that, that there's a step in your CI that says, you know, run CI build wheel.

55:54 And then CI build wheel is designed to really integrate nicely with the build matrix.

55:59 So, you could, in, for a fairly simple package or for many packages, you can really just do Mac, Windows, and Linux have the same job.

56:06 I can get up actions.

56:07 It's easy to do the same job.

56:09 and then I'll see, I build wheel.

56:11 And that's about it.

56:13 It just goes through all the different versions of Python that are supported.

56:16 It goes, it just goes through and makes, a wheel for each.

56:21 And, in fact, it even has one feature that was really nice that, I'd always struggled with a bit is testing.

56:27 So if you give it a test command, it will even take your, your package, it will install it in a new environment.

56:32 That's not, you know, in a different directory.

56:33 That's not related to your build at all.

56:35 And make sure it works and passes whatever test you give it.

56:38 And, We'll do that across the platforms.

56:40 We'll do like a macOS test and a Windows test.

56:42 Yeah.

56:43 For each.

56:43 So CI build wheel really just sees the platform it's sitting on because it's inside the build matrix.

56:47 And so it's run, run for each.

56:49 And, yeah, it does for each, each one.

56:52 It, it, will run that, that test.

56:55 And the most, the simplest test is just echo.

56:57 And that will just make sure it installs.

56:58 Cause it won't try to install your wheel unless, there's something in that test command.

57:03 Even that's useful sometimes.

57:04 Even that's broken sometimes because of NumPy not supporting one of those things in that matrix.

57:09 Yeah.

57:09 It can't install the dependencies.

57:11 So that step fails or something.

57:12 So, it says it currently supports GitHub actions, Azure pipelines, which I don't know how long those are going to be two separate things.

57:20 Maybe they'll always be separate, but Microsoft owning GitHub be like, they're saying do stuff in Azure pipelines.

57:26 And then they're kind of moving.

57:27 Like, yeah, I think they're similar.

57:28 The runners are the same.

57:29 They actually have the same environments.

57:31 so I think they'll exist just as two different interfaces probably.

57:35 And Azure is not so tied to GitHub and it has more of an enterprise type.

57:39 Yeah, for sure.

57:40 It definitely has a different focus.

57:41 It was just a rewrite and a better rewrite in most cases of it.

57:44 I got to learn.

57:45 Yeah.

57:46 They, I think GitHub actions came second.

57:47 All right.

57:48 So then Travis CI app, Bayer circle CI and get lab CI, at least all of those, right?

57:53 At least those are the, those are the ones we test on.

57:56 And then, it runs locally.

57:58 there are some limitations to running it locally.

58:00 if it's, if you target Linux and you can, any, any system that has Docker in target

58:06 Linux, you can just ask to build Linux wheels.

58:08 You can actually run it from like my Mac or from windows.

58:11 I assume from a Windows machine.

58:12 I tried Windows with Docker and, Windows.

58:15 It does install to a, a standard location, C colon backslash CI build wheel.

58:20 But other than that, it's safe to run it there.

58:22 And macOS, it will install to your macOS system.

58:25 It's installed system versions of Python.

58:27 So that's something we haven't solved yet.

58:29 Might be able to someday.

58:30 so it's not a good idea unless you really are okay with installing every version of Python

58:34 that ever existed into your, into your system.

58:37 maybe get a VM of your Mac.

58:40 The Python.org Python.

58:41 Yeah.

58:42 Yeah.

58:42 I mean, it's somewhat safe.

58:43 if you're on Windows, you could use, a Windows subsystem for Linux, WSL as well.

58:50 In addition to Docker, I suspect.

58:51 Although I haven't tried that.

58:53 Mini Linux has to run, you could, I'm sure as long as you can launch Docker, the thing

58:57 that you have to be able to do is launch Docker because you have to use the, mini Linux

59:02 Docker images or you should use that or derivative of that.

59:05 There's lots of rules to exactly what can be in the environment and things like that.

59:09 And PyPA maintains that.

59:12 One thing that also helps is that we have the main, mini Linux maintainer is also a CI

59:17 build wheel maintainer.

59:18 So it's one reason that those things tend that, they fit well together.

59:22 Features tend to match and come out at the same time.

59:25 Like, like Musa Linux, which is a big, big thing recently.

59:27 It's not actually in a released version of CI build wheel yet.

59:30 What is Musa Linux?

59:31 So a normal Linux, is based on Glib C and that's actually what controls.

59:37 It's one of two things that controls mini Linux.

59:39 So if, can you download the binary wheel or do you have to build?

59:43 If you have a old version of pip that will, they had to teach pip about each

59:48 version of mini Linux.

59:48 That was a mess.

59:50 So they eventually switched to not to a standard numbering system.

59:53 That is your G lib C number.

59:55 And now pip come.

59:56 Doesn't.

59:56 And the current pip will be able to install a future mini Linux as long as your systems.

01:00:00 But, that was a big problem.

01:00:02 So pip nine can only install mini Linux one.

01:00:05 It can't install mini Linux.

01:00:06 And even if your G lib C is fine for it.

01:00:08 So the real, the other thing is the G lib C version and, mini Linux one was based on

01:00:14 Sinto S five, grid hat five.

01:00:16 mini Linux 2010 was Sinto S six mini Linux 2014 was Sinto S seven.

01:00:22 And then now they switched to Debian because of the, Sinto S sort of switching to the

01:00:26 stream model.

01:00:27 so mini Linux two 14 or two 24 is G lib C 2.24.

01:00:32 And that's Debian eight or something like that.

01:00:36 And so, but that's G lib C based.

01:00:38 There are, distributions that are not G lib C based, most notably Alpine, very used

01:00:43 Alpine.

01:00:44 it's this tiny, tiny little Docker image.

01:00:47 It's really fun distribution to use if you're on Docker, but, it actually sounds fun to

01:00:51 install, but I've never tried it without Docker, but, it's these five megabyte Docker

01:00:56 wheels or Docker is Docker doesn't do wheels.

01:00:58 Docker images, Docker images.

01:01:00 Yeah.

01:01:00 But, that doesn't use G lib C.

01:01:02 That uses muscle.

01:01:03 And so muscle Linux will run on Alpine.

01:01:07 Okay.

01:01:07 Got it.

01:01:08 So if you're building for the platform Alpine and similar ones, right.

01:01:12 So anything.

01:01:14 Yeah.

01:01:15 And you said, I can run this locally as well.

01:01:17 I know I would use it in CI cause I'm trying, I've got that matrix of all the versions

01:01:22 of CPython and pipe pipe, P Y P Y, and then all the platforms that I want to check as many

01:01:28 of those boxes as possible to put wheels in it.

01:01:30 Right.

01:01:31 yeah.

01:01:31 Suppose I'm on my Mac and I want to make use of this to fill in, maybe do some testing,

01:01:37 at least on some of these columns.

01:01:38 Like, how do I do that?

01:01:39 What's the benefit there?

01:01:41 Well, I can tell you the case where it happened.

01:01:42 so we were shipping, CMake and the psychic build organization ran out of Travis credits

01:01:50 and, they were being built.

01:01:52 We hadn't switched them over to being emulated builds on, GitHub actions yet.

01:01:57 And it just ran out.

01:01:58 We couldn't, we couldn't build them.

01:01:59 And one of them had been missed and we also weren't waiting to upload.

01:02:02 So we had uploaded everything, but we had one, one set or maybe, maybe it was all of the emulated

01:02:07 builds.

01:02:07 I think it was one set that didn't, didn't work.

01:02:09 And so we wanted to go ahead and upload those, those missing wheels.

01:02:13 And I tried, but, I couldn't actually get emulation.

01:02:16 Docker Q, Q, Q, Q, E, M, U emulation.

01:02:20 What?

01:02:20 I couldn't get that working on my Mac.

01:02:22 So, the mini links maintainer used his Linux machine and he, yeah, had Q, Q, E, M, U emulation

01:02:29 on it and he built the emulated images a few hours, but he just built them locally and then sent

01:02:34 and then uploaded, filled in the missing wheels.

01:02:36 So if, if I'm, maintaining a package, I'm, I got some package I'm putting on pipe AI and I want

01:02:43 to test it.

01:02:44 Does it make sense to do it locally or does it just make sense to put it on a summit?

01:02:47 Some CI system.

01:02:49 for C and build wheel, usually I do some local testing, but I'm also developing C and build

01:02:54 wheel, but, you know, usually it's probably fine to do this in your, just in your CI and

01:02:59 usually don't want to run the full, full thing.

01:03:00 Every time usually you have your regular unit tests, but C and build wheel is going to be

01:03:04 a lot slower because it's going through and it's making each set of wheels and launching

01:03:08 Docker images and things like that.

01:03:09 and it's installing Python each time, for macOS and windows.

01:03:13 So, usually unless if you have a fairly quick build, I've seen some people just run C and

01:03:17 people just run CI build wheel as part of their test suite.

01:03:19 but usually you just run it, say right before release.

01:03:22 Maybe I usually do it once before the release and then I'm the release.

01:03:25 Right.

01:03:26 Exactly.

01:03:26 Okay.

01:03:26 That makes sense.

01:03:27 Cause it's a pretty heavyweight type of operation.

01:03:30 So when I look at all these different platforms, I see macOS Intel, macOS, Apple Silicon, different,

01:03:35 businesses of windows.

01:03:37 And then I think about CI systems, you know, what CI systems can I use that support all these

01:03:43 things?

01:03:43 Like does GitHub actions support both versions of macOS, for example, plus windows.

01:03:48 GitHub actions is by far our most popular, platform.

01:03:52 It switched very quickly.

01:03:53 It used to be Travis.

01:03:54 Travis was a challenge cause they didn't do windows very well.

01:03:56 They still don't do windows very well.

01:03:57 and it's a challenge for us because we actually can't run our macOS, tests on them anymore.

01:04:02 Because once we joined the pipe PA, the billing became an issue and we just basically just

01:04:07 lost, macOS running for it.

01:04:09 but, circle, I think, Azure and GitHub actions, I think they do all three.

01:04:16 and you can always flip things up.

01:04:18 I always do Travis for the Linux and then app fair for windows.

01:04:22 You can do it that way.

01:04:24 One of the big things that I had developed for CI build wheel was the, pipe project

01:04:29 dot Tomo or any Tomo configuration, usually that, configuration for CI build wheel.

01:04:34 That way you can get your CI build wheel configuration out of your, your, YAML files.

01:04:40 That way it works locally.

01:04:41 which is one of the main, one of the things I was after, but also you can just do it and

01:04:45 then run on several different systems.

01:04:47 Like you might like the fact that Travis, Travis is, I think the only one that does the,

01:04:51 native strange architectures.

01:04:53 You have to emulate it other places, which is a lot slower, five times slower or something.

01:04:58 Yeah.

01:04:58 So kind of split that up, get the, the definition and then create maybe multiple

01:05:02 CI, jobs.

01:05:04 Your CI scripts are really simple.

01:05:05 Yeah.

01:05:05 Yeah.

01:05:06 Yeah.

01:05:07 Very cool.

01:05:07 The example script is just a few lines.

01:05:09 It doesn't, it does not take much to do this comparing.

01:05:12 Oh yeah.

01:05:12 Hundreds of lines it used to take.

01:05:14 Yeah, sure.

01:05:15 And I didn't even scroll down here.

01:05:16 You've got a nice grid on github.com/IPA slash CI build wheel that shows on GitHub actions,

01:05:22 which is supported on Azure pipelines.

01:05:24 What's supported.

01:05:25 It's not right.

01:05:26 Circle CI doesn't do this.

01:05:27 No, but yeah.

01:05:29 App there, Travis, Azure and GitHub do.

01:05:34 Where does the macOS, but we can't test it.

01:05:36 Theoretically, it does it.

01:05:38 Gotcha.

01:05:39 And then, yeah, I wonder about the, the M1, the Apple Silicon arm versions versus the Intel versions.

01:05:46 I don't know how, how well that's permeated into the world yet.

01:05:49 but the fact that they have Mac at all is kind of impressive.

01:05:52 Nobody has an M1 runner yet.

01:05:53 there are a few places I think now that you can purchase time on one, but no runners.

01:05:59 I mean, last I checked GitHub actions, you couldn't even write yourself on the M1.

01:06:03 that may be, that may have changed.

01:06:05 I don't know.

01:06:06 That was a while back.

01:06:07 Yeah.

01:06:07 I mean, there are some crazy, places out there.

01:06:10 I think there's one called Mac mini Colo.

01:06:13 I think that's what it's called.

01:06:14 Let me see if that's, yeah, I think that's it.

01:06:17 Yeah.

01:06:17 So you can get, you can go to these places like Mac mini Colo and get, get a whole bunch

01:06:24 of Mac minis and put them into this crazy data center.

01:06:28 But you know, that's not the same as I upload a text file into GitHub that says run on Azure

01:06:35 on GitHub actions.

01:06:36 And then that's the end of it.

01:06:37 Right.

01:06:37 You probably got to set up your whole, like some whole build system into a set of minis.

01:06:42 And like, that doesn't sound very practical for most people.

01:06:45 Ideally with what you could do is, I mean, you just need one mini and then you set up

01:06:49 a GitHub actions, hosted runner, a locally hosted runner.

01:06:53 and other systems in that too, get, get, get lab CI was big on that.

01:06:57 you can, you can do anything on get lab CI.

01:06:59 We just haven't tested that because they don't have those publicly.

01:07:01 But, if you, if you have your own, you can do that.

01:07:04 I know, I know somebody who does this with basically with root and runs the, has a Mac mini

01:07:09 and runs the M1 builds on that.

01:07:11 But, you could do that.

01:07:13 And I have a Mac mini and the lead developer of CI build.

01:07:15 Will also has a, Mac mini or the M1.

01:07:18 He has an M1 of something.

01:07:20 I don't know.

01:07:20 I have a Mac mini.

01:07:21 Mine is Mac.

01:07:22 That's what I'm, talking to you right now on.

01:07:24 It's a fantastic little machine.

01:07:25 Yeah.

01:07:25 It's, it's very impressive.

01:07:27 I love the way the boost histogram.

01:07:28 It was fast.

01:07:29 I have a 16 inch, almost maxed out, MacBook and the Mac mini M1.

01:07:34 It was faster on boost histogram than this thing.

01:07:36 Wow.

01:07:36 Yeah.

01:07:37 I have a maxed out six, 15 inches, a little bit older, a couple of years, but I just don't

01:07:42 touch that thing unless I literally need it as a laptop because I want to be somewhere else.

01:07:45 But yeah, I'm definitely not drawn to it.

01:07:47 These, so you could probably set up one of these minis for 700 bucks and then tie it up.

01:07:52 But that's again, not as easy as, you know, just clicking the public free option that works,

01:07:56 but still it's, it's within the realm of possibility.

01:07:59 Yeah.

01:08:00 And Apple has actually helped out several, like, I know, homebrew and a few others they've

01:08:04 helped out with, by giving them either Mac minis or some, some, something that they

01:08:10 could build with.

01:08:10 So they, I believe, brew actually builds, homebrew actually builds on him on real

01:08:17 M1s.

01:08:17 I know it does because they're, the builds are super fast.

01:08:20 I remember that.

01:08:20 Like it builds root like 20 minutes, the root recipe, because I maintain that.

01:08:24 And the normal one takes about an hour because running on multiple cores, but, it's like

01:08:29 three times faster.

01:08:30 It's done in 20 minutes.

01:08:31 I just thought something was wrong when I first saw that.

01:08:33 That's it.

01:08:34 How could it be done?

01:08:35 Something broke.

01:08:36 What broke?

01:08:36 Interesting.

01:08:37 All right, Henry, we're getting really short on time, a little bit over, but it's been a

01:08:40 fun conversation.

01:08:41 How about you give us a look at the future?

01:08:42 Where are things going?

01:08:43 with all the stuff.

01:08:45 Next thing I'm interested in, in, being involved with is scikit build, which,

01:08:50 is a, a package that currently sort of augments setup tools, but hopefully will eventually

01:08:57 sort of replace setup tools as your, as the thing that you, build with.

01:09:01 And it will call out to CMake.

01:09:02 So you basically just, you basically write a CMake, file.

01:09:06 And this could wrap an existing package, or maybe you need some of the other things that

01:09:11 CMake has.

01:09:11 And this will then let you build that as a regular Python package.

01:09:16 In fact, recently somebody, sort of put together, see, I build a wheel.

01:09:19 Yeah.

01:09:20 It's like it built in the CMake example and, and built, LLVM and pulled out just the

01:09:25 claim format tool and made wheels out of that.

01:09:28 And now you can just do pip install clang format.

01:09:30 It's one to two megabytes.

01:09:32 It works on all systems, including Apple Silicon and things.

01:09:34 I just tried it on Apple Silicon yesterday and it's a pip install.

01:09:37 Now you can claim format C++ code.

01:09:39 And that's just, you know, mind blowing.

01:09:41 You can add it to pre-commit.

01:09:42 The pre-commit CI, it runs in two.

01:09:44 I mean, I'd been fighting for about a week to reduce the, the, size of the claim format

01:09:48 recipe from 600 megabytes to just under the 250.

01:09:52 That was the maximum for pre-commit.CI.

01:09:54 And then you can now pip install under about a megabyte for Linux, that, that sort of thing.

01:09:59 And I, I think that would be really, that would be a really great thing to, to work on.

01:10:04 It's been around since 2014, but it needs some, some serious work.

01:10:08 And so I'm currently actually working on writing a grant to, to try to get funded, to just work

01:10:13 on, basically the scikit build, scikit build system and looking for interesting science

01:10:17 use cases that would be interested in, adapting, or switching, existing build

01:10:23 system over or adapting to it.

01:10:24 or taking something that has never been available from Python and making it available.

01:10:30 And yes, root, root might be one.

01:10:32 Scikit build package.

01:10:33 I'm looking for wide variety.

01:10:35 Yeah.

01:10:35 How neat.

01:10:36 Scikit build package is fundamentally just the glue between set of tools, Python module and

01:10:40 CMake.

01:10:40 Yeah.

01:10:41 So it's a real way to take some of these things that were based on CMake and sort of expose

01:10:45 them to Python.

01:10:46 Yeah.

01:10:46 So you can just, have a CMake package that does all the CMake things well, you know,

01:10:51 like finding, finding different libraries and, and that I'm a big CMake person.

01:10:55 But how many of you physically uses it very heavily.

01:10:57 Most C++ does.

01:10:58 It's about 60%.

01:10:59 I think of all field systems are, are CMake based now.

01:11:03 going from GitWare's numbers, but they make CMake.

01:11:06 But, it's, I think it's a, it's very powerful.

01:11:10 It can be used for things like that.

01:11:11 And, will really open up a much easier C++, more natural in C++ and C and, and

01:11:18 Fortran and things like that.

01:11:19 And CUDA then is currently available.

01:11:20 Setup tools is, disto tools is going away in Python 3.12.

01:11:23 Setup tools is not really designed to build C++ packages or packages.

01:11:29 It was really just a hack on top of disto tools, which happened to be build just Python itself.

01:11:34 So, well, scikit build sounds like the perfect tool to apply to the science space because

01:11:40 there's so many of these weird compiled things that are challenging to, you know, install and

01:11:45 deploy and share and so on.

01:11:47 So making that easier.

01:11:48 Sounds good.

01:11:48 All right.

01:11:49 Well, I think we're probably going to need, need to leave it there just for the sake of time,

01:11:53 but it's been, it's been awesome to talk about all the internals of supporting scikit

01:11:59 hep and people should check out CI build wheel.

01:12:02 It looks like it, you know, if you're maintaining a package either publicly or just for internal

01:12:06 for your organization, it looks like it'd be a big help.

01:12:08 Yeah.

01:12:09 If it's got binary, any sort of binary build in it.

01:12:11 Yes.

01:12:11 Yeah, absolutely.

01:12:12 If not build is fine.

01:12:13 Yeah.

01:12:14 Right.

01:12:14 And I learned about build, which is good to know.

01:12:16 All right.

01:12:17 So before you get out of your Henry, let me ask you the two final questions.

01:12:20 you're going to write some code, I mean, like Python code, what editor would you use?

01:12:25 Depends on how much it'll either be VI.

01:12:27 If it's a very small amount.

01:12:29 if it's a really large project that let's say it takes several days, then I'll use,

01:12:34 PyCharm.

01:12:35 And then I've really started using VS Code quite a bit.

01:12:38 And that's sort of expanding to fill in all the middle ground and kind of eating in on both

01:12:42 of the other, both of the edges.

01:12:43 Yeah.

01:12:44 Yeah.

01:12:45 There's some interesting stuff going there.

01:12:46 Good choice.

01:12:46 But all with the VI, mode or plugins added, of course.

01:12:51 And then, notable PyPI package.

01:12:53 I mean, we probably talked about 20 already.

01:12:55 If you want to just give a shout out to one of those, that's fine.

01:12:57 Or if you got a new idea.

01:12:58 I'm going to go with one that's, unlike, I might not get mentioned, but I, I, I'm really

01:13:03 excited by it.

01:13:04 The development of it is, the, I think the developer is quite new, but what he's actually

01:13:09 done as far as the actual package has been, been nice.

01:13:12 It needs, it needs some, some nice touches.

01:13:14 But, and that is plot text, yellow T T E X T.

01:13:19 And I'm really excited about that because it makes these, the actual plots it makes are

01:13:23 really, really nice.

01:13:24 And they're plotted to the terminal and, it can integrate with rich.

01:13:28 and of course, I'm, I'm interested in it because I want to integrate it with tech.

01:13:32 I want to see it integrated with a textual, I think a textual app that combines this with,

01:13:38 file browsers and things like that.

01:13:41 It'd be incredible.

01:13:42 Yeah.

01:13:42 So you can do things like the terminal, for example.

01:13:45 Yeah.

01:13:45 So you could like cruise around your files, use your, your root IO integration, pull

01:13:51 these things up here and, you know, put the plot right on the screen.

01:13:53 Right.

01:13:53 But in the terminal.

01:13:54 Okay.

01:13:55 Yeah.

01:13:55 This is really cool.

01:13:56 I had no idea.

01:13:56 And this is based on rich.

01:13:57 You say.

01:13:58 it can integrate with rich.

01:13:59 It integrates with rich.

01:14:00 Okay.

01:14:00 Got it.

01:14:01 Yeah.

01:14:01 So as soon as I saw it, I started trying to make sure the two people were talking to each

01:14:04 other.

01:14:05 Will and the person who is developing this.

01:14:08 Yeah, exactly.

01:14:08 All right.

01:14:09 These things work together.

01:14:10 That's very cool.

01:14:11 They seem like they should, right?

01:14:12 They're in the same general zone.

01:14:14 Yeah.

01:14:14 And they do now.

01:14:15 The, you had, there had to be some communication back and forth as far as what size the plots

01:14:19 were.

01:14:19 Right.

01:14:20 This should, this should work in it.

01:14:21 A good recommendation.

01:14:22 definitely one I had not learned about.

01:14:24 So I'm sure people will enjoy that.

01:14:25 All right, Henry, final call to action.

01:14:27 People want to do more with wheels, CI build wheel, or maybe some of the other stuff we talked

01:14:31 about.

01:14:31 What do you tell them?

01:14:32 look through, I think one of the best places to go is the psychic developer pages.

01:14:36 If you have no interest in psychic tools or hep at all.

01:14:40 and that sort of shows you how all these things integrate together really well.

01:14:43 And, has nice, has nice documentation.

01:14:46 Of course, CI build wheel itself is nice.

01:14:48 And the pipe PA, a lot of the IP projects have gotten, good documentation as well as packaging

01:14:54 of python.org.

01:14:55 We've updated that quite a bit.

01:14:56 Look like to reflect some of these things, but I would really, I really like the psychic

01:15:01 developer pages.

01:15:03 I mean, I'm biased because I wrote most of them.

01:15:04 Nice.

01:15:06 Yeah.

01:15:06 I'll link to those.

01:15:07 And I'll, I'll try to link to pretty much everything else that we spoke to as well.

01:15:11 So people can check out the podcast player show notes to find all that stuff.

01:15:14 I guess one final thing that we didn't call out that I think is worth pointing out is CI build

01:15:18 wheel is under the pipe, the Python packaging authority.

01:15:21 So it gives it some officialness, I guess you should say.

01:15:24 Yes.

01:15:24 That happened after, after I joined one of the first things I wanted to do was I thought this

01:15:28 should really be in the pipe PA.

01:15:30 And, I was sort of pushing for that and the other developers were fine with that.

01:15:34 And so we brought it up and, I actually joined the pipe PA just before that by becoming

01:15:39 a member of build.

01:15:40 so I got to vote on CI build wheel coming in, but it was a very enthusiastic vote, even

01:15:44 without my vote.

01:15:45 and pipX joined right at the same time too.

01:15:48 So those were, it was fighting time.

01:15:50 PipX is a great library.

01:15:51 I really like the way pipX works.

01:15:53 It's a great tool.

01:15:54 All right, Henry, thank you for being here.

01:15:56 It's been great.

01:15:57 Thanks for all the insight on all these internals around building and installing Python packages.

01:16:01 There's also a lot more in my blog.

01:16:03 So I sign them, pie.

01:16:04 Dot.

01:16:04 Get lab.

01:16:05 Dot.

01:16:05 I.

01:16:05 Oh, so that's also a link to look that links to all those other things, obviously do.

01:16:09 Thanks again for being here.

01:16:10 Yeah.

01:16:11 See ya.

01:16:11 Thanks for having me.

01:16:11 Yeah.

01:16:12 You bet.

01:16:12 This has been another episode of talk Python to me.

01:16:16 Our guest on this episode was Henry Schreiner and it's brought to you by us over at talk

01:16:20 Python training and the transcripts were brought to you by assembly AI.

01:16:24 Do you need a great automatic speech to text API?

01:16:27 Get human level accuracy in just a few lines of code.

01:16:30 Visit talkpython.fm/assembly AI.

01:16:32 Want to level up your Python?

01:16:34 We have one of the largest catalogs of Python video courses over at talk Python.

01:16:38 Our content ranges from true beginners to deeply advanced topics like memory and async.

01:16:43 And best of all, there's not a subscription in sight.

01:16:46 Check it out for yourself at training.talkpython.fm.

01:16:49 Be sure to subscribe to the show.

01:16:51 Open your favorite podcast app and search for Python.

01:16:54 We should be right at the top.

01:16:55 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the

01:17:01 direct RSS feed at /rss on talkpython.fm.

01:17:05 We're live streaming most of our recordings these days.

01:17:08 If you want to be part of the show and have your comments featured on the air, be sure to

01:17:12 subscribe to our YouTube channel at talkpython.fm/youtube.

01:17:16 This is your host, Michael Kennedy.

01:17:17 Thanks so much for listening.

01:17:19 I really appreciate it.

01:17:20 Now get out there and write some Python code.

01:17:22 I'll see you next time.

Talk Python's Mastodon Michael Kennedy's Mastodon