Learn Python with Talk Python's 270 hours of courses

Using cibuildwheel to manage the scikit-HEP packages

Episode #338, published Sun, Oct 17, 2021, recorded Thu, Oct 14, 2021

How do you build and maintain a complex suite of Python packages? Of course, you want to put them on PyPI. The best format there is as a wheel. This means that when developers use your code, it comes straight down and requires no local tooling to install and use.

But if you have compiled dependencies, such as C or FORTRAN, then you have a big challenge. How do you automatically compile and test against Linux, macOS (Intel and Apple Silicon), Windows, and so on? That's the problem cibuildwheel is solving.

On this episode, you'll meet Henry Schreiner. He is developing tools for the next era of the Large Hadron Collider (LHC) and is an admin of Scikit-HEP. Of course, cibuildwheel is central to this process.

Watch this episode on YouTube
Play on YouTube
Watch the live stream version

Episode Deep Dive

Guest Introduction and Background

Henry Schreiner is a seasoned Python and C++ developer working at Princeton University, focusing on High-Energy Physics (HEP) and research software engineering. He is a core maintainer of the Scikit-HEP library suite, contributes to various open-source projects including PyBind11, and works on next-generation tools for the Large Hadron Collider data analysis stack. Henry’s work bridges C++ and Python, emphasizing tools that make scientific computing more accessible and efficient. He is also a maintainer of CI Build Wheel, a project that simplifies building and distributing Python wheels across platforms and Python versions.

What to Know If You're New to Python

Here are a few essentials to help you navigate this episode more effectively:

  • Python Installation: Make sure you have a standard CPython installation, ideally version 3.7+.
  • Package Management: Familiarize yourself with pip install and virtual environments so you can isolate your projects.
  • Understanding Binary Extensions: Many scientific or performance-focused Python packages have C or C++ components. Knowing that they need specialized building steps (compilers, etc.) will help clarify references in this episode.
  • Documentation: Check out the official Python Docs for an in-depth guide on packaging and distribution basics.

Key Points and Takeaways

  1. CI Build Wheel: A Solution for Cross-Platform Wheels CI Build Wheel greatly simplifies the process of building, testing, and distributing Python wheels across multiple operating systems and Python versions. It helps manage complex tasks like setting up specific Python environments and ensuring a consistent, fully tested wheel for Linux, macOS (Intel and Apple Silicon), and Windows.
  2. Scikit-HEP: A Suite of Particle Physics Tools in Python Scikit-HEP is a collection of libraries designed to support high-energy physics workflows with Python. It includes specialized utilities for histogramming, file reading, vector manipulations, and more, bringing modern Python practices to the HEP community.
  3. Awkward Array for Irregular Data Structures Awkward Array provides a NumPy-like interface for handling nested or variable-length data, such as lists of lists with differing lengths. It integrates with Numba for just-in-time compilation and supports large-scale data common in HEP and beyond (e.g., genomics).
  4. Boost Histogram and Hist: Advanced Histogramming in Python Boost Histogram (C++ library) and Hist (Pythonic API) enable efficient histogram creation, manipulation, and rebinning. They bring object-oriented capabilities to histogram data, making it easier to store, label, and operate on multi-dimensional bins.
  5. PyBind11: Seamless C++ and Python Interoperability PyBind11 is a powerful header-only library for exposing C++ code to Python, making it straightforward to build extension modules without learning a new “binding language.” This approach simplifies bridging performance-intensive routines with Python’s high-level features.
  6. CI Build Wheel Workflow and Testing CI Build Wheel not only compiles wheels but can also test them in isolated environments. This ensures a clean separation between build and test, preventing hidden dependencies from leaking into your final distributions.
  7. Minimizing Complexity with Docker on Linux Builds Building Linux wheels is typically done using Docker images based on the manylinux policy. By standardizing the build environment, developers ensure compatibility across various Linux distributions while avoiding system-specific configuration issues.
  8. Apple Silicon (M1) and Cross-Architecture Builds With Apple Silicon, teams face an additional dimension of complexity. Tools like CI Build Wheel aim to unify building Intel-based macOS wheels and arm64-based Apple Silicon wheels, though native CI runners for M1 remain limited as of this conversation.
  9. Scikit-Build: A Next Step for C++/Python Projects Scikit-Build leverages CMake to build complex, multi-language Python packages. Henry mentioned how modernizing scikit-build could reduce friction when combining C++ libraries, CUDA code, or advanced build flows within Python packages.
  10. The Power of Wheel Distribution for Scientific Computing Throughout the conversation, Henry underscored the efficiency gained by distributing wheels for scientific tools. Whether for HPC, data analysis, or advanced computing, wheels ensure that installation is streamlined, reproducible, and free of compiler hassles.

Interesting Quotes and Stories

  • On distributing HPC packages: Henry highlighted how previously, “you’d have a special Python install that took hours to set up,” but now with wheels and tools like CI Build Wheel, “you can just pip install and everything works.”
  • On bridging Python and C++: “I like to be in that space between the two,” Henry said, emphasizing how modern Pythonic approaches can make C++ functionality more accessible to data scientists.

Key Definitions and Terms

  • Wheel: A built, ready-to-install distribution format for Python packages (usually .whl files).
  • HEP: High-Energy Physics, focusing on subatomic particles and large-scale experiments like the Large Hadron Collider.
  • Numba: A JIT compiler that optimizes numerical Python code for faster execution.
  • manylinux: A set of Docker-based policies and images allowing wheel compatibility across different Linux distributions.
  • Awkward Array: A library that supports nested, variable-length arrays in a NumPy-like interface.

Learning Resources

Below are a few resources to explore or refine your Python packaging and scientific computing expertise.

Overall Takeaway

The conversation underscores the power of modern Python packaging and the scientific stack. Tools like CI Build Wheel, PyBind11, and Scikit-HEP empower developers and researchers to create cross-platform, high-performance solutions without sacrificing the convenience of Python’s ecosystem. By embracing these frameworks and best practices, teams can streamline data analysis, deliver consistent user experiences, and accelerate innovation in both academic and enterprise settings.

Links from the show

Henry on Twitter: @HenrySchreiner3
Henry's website: iscinumpy.gitlab.io

Large Hadron Collider (LHC): home.cern
cibuildwheel: github.com
plumbum package: plumbum.readthedocs.io
boost-histogram: github.com
vector: github.com
hepunits: github.com
awkward arrays: github.com
Numba: numba.pydata.org
uproot4: github.com
scikit-hep developer: scikit-hep.org
pypa: pypa.io
CLI11: github.com
pybind11: github.com
cling: root.cern
Pint: pint.readthedocs.io
Python Wheels site: pythonwheels.com
Build package: pypa-build.readthedocs.io
Mac Mini Colo: macminicolo.net
scikit-build: github.com
plotext: pypi.org
Code Combat: codecombat.com
clang format wheel: github.com
cibuildwheel examples: cibuildwheel.readthedocs.io
Cling in LLVM: root.cern

New htmx course: talkpython.fm/htmx
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Talk Python's Mastodon Michael Kennedy's Mastodon