Using cibuildwheel to manage the scikit-HEP packages
But if you have compiled dependencies, such as C or FORTRAN, then you have a big challenge. How do you automatically compile and test against Linux, macOS (Intel and Apple Silicon), Windows, and so on? That's the problem cibuildwheel is solving.
On this episode, you'll meet Henry Schreiner. He is developing tools for the next era of the Large Hadron Collider (LHC) and is an admin of Scikit-HEP. Of course, cibuildwheel is central to this process.
Episode Deep Dive
Guest Introduction and Background
Henry Schreiner is a seasoned Python and C++ developer working at Princeton University, focusing on High-Energy Physics (HEP) and research software engineering. He is a core maintainer of the Scikit-HEP library suite, contributes to various open-source projects including PyBind11, and works on next-generation tools for the Large Hadron Collider data analysis stack. Henry’s work bridges C++ and Python, emphasizing tools that make scientific computing more accessible and efficient. He is also a maintainer of CI Build Wheel, a project that simplifies building and distributing Python wheels across platforms and Python versions.
What to Know If You're New to Python
Here are a few essentials to help you navigate this episode more effectively:
- Python Installation: Make sure you have a standard CPython installation, ideally version 3.7+.
- Package Management: Familiarize yourself with
pip install
and virtual environments so you can isolate your projects. - Understanding Binary Extensions: Many scientific or performance-focused Python packages have C or C++ components. Knowing that they need specialized building steps (compilers, etc.) will help clarify references in this episode.
- Documentation: Check out the official Python Docs for an in-depth guide on packaging and distribution basics.
Key Points and Takeaways
- CI Build Wheel: A Solution for Cross-Platform Wheels
CI Build Wheel greatly simplifies the process of building, testing, and distributing Python wheels across multiple operating systems and Python versions. It helps manage complex tasks like setting up specific Python environments and ensuring a consistent, fully tested wheel for Linux, macOS (Intel and Apple Silicon), and Windows.
- Links / Tools:
- GitHub: github.com/pypa/cibuildwheel
- Azure DevOps, GitHub Actions, Travis, Circle, AppVeyor: Common CI platforms
- Links / Tools:
- Scikit-HEP: A Suite of Particle Physics Tools in Python
Scikit-HEP is a collection of libraries designed to support high-energy physics workflows with Python. It includes specialized utilities for histogramming, file reading, vector manipulations, and more, bringing modern Python practices to the HEP community.
- Links / Tools:
- Project: Scikit-HEP
- Uproot: Pure Python root file reader
- Awkward Array: Handling irregular data arrays
- Links / Tools:
- Awkward Array for Irregular Data Structures
Awkward Array provides a NumPy-like interface for handling nested or variable-length data, such as lists of lists with differing lengths. It integrates with Numba for just-in-time compilation and supports large-scale data common in HEP and beyond (e.g., genomics).
- Links / Tools:
- GitHub: github.com/scikit-hep/awkward
- Numba: JIT compilation for Python
- Links / Tools:
- Boost Histogram and Hist: Advanced Histogramming in Python
Boost Histogram (C++ library) and Hist (Pythonic API) enable efficient histogram creation, manipulation, and rebinning. They bring object-oriented capabilities to histogram data, making it easier to store, label, and operate on multi-dimensional bins.
- Links / Tools:
- PyBind11: Seamless C++ and Python Interoperability
PyBind11 is a powerful header-only library for exposing C++ code to Python, making it straightforward to build extension modules without learning a new “binding language.” This approach simplifies bridging performance-intensive routines with Python’s high-level features.
- Links / Tools:
- GitHub: github.com/pybind/pybind11
- Links / Tools:
- CI Build Wheel Workflow and Testing
CI Build Wheel not only compiles wheels but can also test them in isolated environments. This ensures a clean separation between build and test, preventing hidden dependencies from leaking into your final distributions.
- Links / Tools:
- Minimizing Complexity with Docker on Linux Builds
Building Linux wheels is typically done using Docker images based on the manylinux policy. By standardizing the build environment, developers ensure compatibility across various Linux distributions while avoiding system-specific configuration issues.
- Links / Tools:
- Apple Silicon (M1) and Cross-Architecture Builds
With Apple Silicon, teams face an additional dimension of complexity. Tools like CI Build Wheel aim to unify building Intel-based macOS wheels and arm64-based Apple Silicon wheels, though native CI runners for M1 remain limited as of this conversation.
- Links / Tools:
- Scikit-Build: A Next Step for C++/Python Projects
Scikit-Build leverages CMake to build complex, multi-language Python packages. Henry mentioned how modernizing scikit-build could reduce friction when combining C++ libraries, CUDA code, or advanced build flows within Python packages.
- Links / Tools:
- The Power of Wheel Distribution for Scientific Computing Throughout the conversation, Henry underscored the efficiency gained by distributing wheels for scientific tools. Whether for HPC, data analysis, or advanced computing, wheels ensure that installation is streamlined, reproducible, and free of compiler hassles.
- Links / Tools:
Interesting Quotes and Stories
- On distributing HPC packages: Henry highlighted how previously, “you’d have a special Python install that took hours to set up,” but now with wheels and tools like CI Build Wheel, “you can just pip install and everything works.”
- On bridging Python and C++: “I like to be in that space between the two,” Henry said, emphasizing how modern Pythonic approaches can make C++ functionality more accessible to data scientists.
Key Definitions and Terms
- Wheel: A built, ready-to-install distribution format for Python packages (usually
.whl
files). - HEP: High-Energy Physics, focusing on subatomic particles and large-scale experiments like the Large Hadron Collider.
- Numba: A JIT compiler that optimizes numerical Python code for faster execution.
- manylinux: A set of Docker-based policies and images allowing wheel compatibility across different Linux distributions.
- Awkward Array: A library that supports nested, variable-length arrays in a NumPy-like interface.
Learning Resources
Below are a few resources to explore or refine your Python packaging and scientific computing expertise.
- Python for Absolute Beginners: An excellent place to start if you’re new to Python.
- Modern APIs with FastAPI and Python: Learn how to structure and distribute your Python applications when building modern APIs.
- Getting started with pytest: Discover how to fully test your code (including wheel testing) with Python’s top testing framework.
Overall Takeaway
The conversation underscores the power of modern Python packaging and the scientific stack. Tools like CI Build Wheel, PyBind11, and Scikit-HEP empower developers and researchers to create cross-platform, high-performance solutions without sacrificing the convenience of Python’s ecosystem. By embracing these frameworks and best practices, teams can streamline data analysis, deliver consistent user experiences, and accelerate innovation in both academic and enterprise settings.
Links from the show
Henry's website: iscinumpy.gitlab.io
Large Hadron Collider (LHC): home.cern
cibuildwheel: github.com
plumbum package: plumbum.readthedocs.io
boost-histogram: github.com
vector: github.com
hepunits: github.com
awkward arrays: github.com
Numba: numba.pydata.org
uproot4: github.com
scikit-hep developer: scikit-hep.org
pypa: pypa.io
CLI11: github.com
pybind11: github.com
cling: root.cern
Pint: pint.readthedocs.io
Python Wheels site: pythonwheels.com
Build package: pypa-build.readthedocs.io
Mac Mini Colo: macminicolo.net
scikit-build: github.com
plotext: pypi.org
Code Combat: codecombat.com
clang format wheel: github.com
cibuildwheel examples: cibuildwheel.readthedocs.io
Cling in LLVM: root.cern
New htmx course: talkpython.fm/htmx
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm
--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy