Wheel Next + Packaging PEPs

0:00

01:11:17

Links Episode Deep Dive Transcript

Panelists

Charlie Marsh

Ralf Gommers

Jonathan Dekhtiar

When you pip install a package with compiled code, the wheel you get is built for CPU features from 2009. Want newer optimizations like AVX2? Your installer has no way to ask for them. GPU support? You're on your own configuring special index URLs. The result is fat binaries, nearly gigabyte-sized wheels, and install pages that read like puzzle books. A coalition from NVIDIA, Astral, and QuanSight has been working on Wheel Next: A set of PEPs that let packages declare what hardware they need and let installers like uv pick the right build automatically. Just uv pip install torch and it works. I sit down with Jonathan Dekhtiar from NVIDIA, Ralf Gommers from Quansight and the NumPy and SciPy teams, and Charlie Marsh, founder of Astral and creator of uv, to dig into all of it.

Play on YouTube

Watch the live stream version

Episode Deep Dive

Guests

This episode features a powerhouse panel of three guests who represent the key organizations behind the Wheel Next initiative:

Jonathan Dekhtiar is an engineer at NVIDIA, where he has worked for about eight years. He was fascinated by CUDA technology as a high school student and pursued a PhD to join the company. Over the last two-plus years, he has focused on improving NVIDIA's CUDA and Python offering, finding better ways to expose GPU programming at the Python layer. He has been one of the driving forces behind the Wheel Next initiative and the associated PEPs for over a year.

Ralf Gommers is a physicist by training with a PhD in atomic and quantum physics. He started using Python in 2004, became the release manager for NumPy and SciPy in 2010, and has been maintaining those foundational libraries ever since. He is now co-CEO of Quansight, a public benefit corporation focused on data science, applied AI, and scientific computing consulting. Ralf also created the PyPackaging Native guide, a reference site that documents the core problems in native Python packaging.

Charlie Marsh is the founder and CEO of Astral, the company behind uv (the Python package and project manager), Ruff (the Python linter and formatter), and ty (the Python type checker). Astral has been building open source tooling focused on speed and user experience since October 2022. Charlie and his team have been deeply involved in the Wheel Next collaboration, including building a working prototype of wheel variant support in uv.

What to Know If You're New to Python

Wheels are the standard format for distributing pre-built Python packages. When you run pip install or uv pip install, you are typically downloading a wheel file. Pure Python packages work everywhere, but packages with compiled C, C++, or Fortran code must be built for specific operating systems and CPU architectures.
PyPI (pypi.org) is the main public repository where Python packages are hosted. Tools like pip and uv download wheels from PyPI by default.
CUDA is NVIDIA's programming platform for running code on GPUs. Many popular data science and machine learning libraries (like PyTorch) depend on CUDA, and installing the right version for your hardware has historically been a major pain point.
This episode focuses on packaging infrastructure rather than writing Python code, so familiarity with installing packages via pip install or uv pip install is the main prerequisite to following along.

Key Points and Takeaways

1. Wheel Next: A cross-industry initiative to fix Python's hardware-aware packaging gap

The central topic of this episode is Wheel Next, an open-source initiative aimed at modernizing how Python packages communicate what hardware they need. Today, when you pip install a package with compiled code, the wheel format can only express the OS, CPU architecture, and Python version it was built for. It has no way to declare things like which CPU instruction sets it supports (e.g., AVX2) or which GPU/CUDA version it requires. Wheel Next proposes a generic, extensible system -- via a set of PEPs -- that lets package authors declare these hardware requirements and lets installers like uv automatically detect and select the right build for the user's machine. The initiative is backed by a coalition of at least 14 companies and major open source projects including NVIDIA, Astral, Quansight, Meta, AMD, Intel, Google, Red Hat, and others.

wheelnext.dev -- The Wheel Next project website with notes, drafts, and contributor information

2. The lowest common denominator problem: leaving 10-20x performance on the table

When building wheels for x86-64 CPUs, package authors can only use CPU features dating back to about 2009. Newer hardware capabilities like SSE4, AVX2, and later SIMD instruction sets simply cannot be used because installers have no way to know whether the user's machine supports them. Ralf Gommers explained that the performance difference between 2009-era features and modern ones can be 10x to 20x for scientific workloads. This means the Python ecosystem is systematically leaving massive performance gains on the table for the 40-50% of Python developers doing data science and scientific computing. The same problem exists on ARM, where the default build target is often a Raspberry Pi level chip, far below what desktop and server ARM processors can do.

peps.python.org/pep-0817 -- PEP 817: Wheel Variants Beyond Platform Tags (Draft)
peps.python.org/pep-0825 -- PEP 825: Wheel Variants Package Format (Draft)

3. How NumPy solves this today -- and why it does not scale

NumPy uses an elaborate system of runtime CPU detection and dynamic dispatch to ship a single wheel that performs well across different CPU generations. The library compiles the same source code multiple times targeting different CPU families (Haswell, Skylake, etc.), merges those compiled variants into a single Python extension module, and then at runtime detects the CPU and dispatches to the optimal code path. This approach works -- NumPy has dedicated engineers from Intel and ARM contributing architecture-specific optimizations -- but it creates "fat binaries" that are much larger than necessary and requires enormous engineering effort that most packages simply cannot replicate. SciPy, scikit-learn, Pandas, and Pillow do not use SIMD optimizations in their shipped wheels even though the code exists, because there is no scalable way to ship it.

numpy.org -- NumPy: the fundamental package for scientific computing with Python

4. The PyTorch installation nightmare and the promise of "just pip install torch"

PyTorch is the poster child for how painful hardware-aware installation is today. The PyTorch wheel on PyPI is about 900 megabytes because it bundles CUDA libraries for five or six GPU architectures into a single fat binary. The PyTorch team works incredibly hard just to stay under the one gigabyte mark. Users who need a specific CUDA version must configure special index URLs, and install pages for packages like vLLM read like "puzzle books." The vision of Wheel Next is simple: you type uv pip install torch and the installer automatically detects your GPU, figures out the right CUDA version, and downloads a slim ~200-250 MB wheel built specifically for your hardware. Astral has already built a working prototype of this in a variant-enabled branch of uv.

pytorch.org -- PyTorch: an open source machine learning framework

5. Wheel variants: a generic, extensible solution rather than a blessed list of tags

Rather than adding hundreds of new hard-coded platform tags (one for each CUDA version, each CPU instruction set, etc.) and having to update that list every few years, the Wheel Next PEPs propose a generic system. Packages can declare arbitrary variant metadata, and a plugin interface allows installers to dynamically detect platform attributes and select the best wheel. Jonathan emphasized that this philosophy was intentional: the system is designed to scale without requiring constant maintenance of a "blessed list." The term "variants" itself was chosen to align with terminology used across the broader packaging ecosystem, not just Python.

6. The PEP process: splitting a super-PEP into manageable pieces

PEP 817 earned the distinction of being the longest PEP ever written. It was so comprehensive that the PEP editors' review alone took over a month. The community feedback process on the Python Packaging Discourse (a non-threaded forum) proved challenging for such a complex, multi-faceted proposal. As a result, the team split the work into smaller PEPs, starting with PEP 825 (Wheel Variants: Package Format). This allows separate community discussions about different aspects -- installers, build backends, index servers -- each involving the right stakeholders. The PEPs will likely be provisionally accepted first, with full acceptance coming after all four parts have working implementations and buy-in from tool authors.

discuss.python.org -- Python community Discourse forum where PEP review discussions happen

7. Python Build Standalone: Astral's quiet lever on Python performance

Astral maintains Python Build Standalone, a project that produces relocatable, downloadable CPython distributions. When you install Python through uv, you are actually downloading one of these pre-built distributions rather than building from source. Charlie revealed that their goal is to ship the fastest Python distribution possible, purely through build optimizations, without changing CPython source code. He believes they already have the fastest Python but has not yet published rigorous benchmarks. This is a significant lever because so many developers now bootstrap Python through uv, meaning Astral's build choices (like targeting newer CPU features or glibc versions) could improve Python performance for a huge number of users automatically.

github.com/astral-sh/python-build-standalone -- Python Build Standalone repository

8. Inspiration from Spack, Conda, and Nix -- not reinventing the wheel (too much)

The Wheel Next design draws heavily from existing packaging systems that already handle hardware variants well. Jonathan highlighted Spack, a package manager designed for supercomputers, as a particularly strong influence -- especially its archspec component for CPU variant definitions, which he called "pure brilliance" in its static JSON-based design. Ideas also came from Conda and conda-forge (which build only binaries and skip source distributions entirely), and from Nix (which starts from source and uses binaries as an optimization). The team includes contributors who have worked on all these systems, so the final design synthesizes the best aspects of each to enhance Python packaging specifically.

spack.io -- Spack: a package manager for supercomputers
conda-forge.org -- Conda-Forge: community-driven packaging for conda

9. Pyx: Astral's package registry as a bridge to the variant-enabled future

Charlie provided an update on pyx, Astral's hosted package registry currently in beta. Pyx takes a two-track approach to the GPU packaging problem: pushing standards forward through the Wheel Next effort while simultaneously solving immediate user pain within the current standards. Today, pyx pre-builds packages that depend on CUDA (like PyTorch extensions) across a wide range of CUDA versions, PyTorch versions, Python versions, and CPU architectures. Once wheel variants are standardized, pyx will be in a position to adopt them faster than PyPI because it controls both the registry and the build infrastructure. Combined with uv's faster user adoption cycle (uv users tend to stay on recent versions, unlike pip where a significant portion still uses five-year-old versions), this could accelerate rollout considerably.

astral.sh/pyx -- Pyx: Astral's Python package registry

10. The unprecedented scale of cross-industry collaboration

The Wheel Next initiative involved an in-person summit in March 2025 with representatives from roughly 20 companies, including presentations from the PyTorch team at Meta and the JAX team at Google. Jonathan described the process as feeling like a startup -- rapid prototyping, mockups, collecting feedback, and iterating. The team forked essentially every piece of the Python packaging ecosystem (pip, warehouse/PyPI, setuptools, scikit-build-core, the packaging library) to build working prototypes. The initiative was philosophically modeled after Faster CPython, the Microsoft-funded effort to speed up CPython's interpreter, but is even more organizationally diverse, with funding and contributions from NVIDIA, Meta, AMD, Intel, Google, Red Hat, Quansight, Astral, and many others.

11. The PyPackaging Native guide: stating the problem to enable solutions

Ralf created the PyPackaging Native guide after watching 12-13 years of circular discussions about native packaging problems on mailing lists and forums. The guide intentionally focuses only on explaining the problems (CPU extensions, GPU support, source vs. binary distribution challenges) without proposing solutions, providing a shared baseline for future conversations. Jonathan called it "by far the best explanation anywhere on the internet to all these packaging issues." The Wheel Next team then took the flip side: don't restate the problem, just propose solutions. As Jonathan quoted: "A problem well stated is a problem half solved."

pypackaging-native.github.io -- PyPackaging Native: a guide to native Python packaging challenges

12. Timeline and adoption expectations

Ralf estimated that PEP review and prototype updates will take the better part of 2026, after which PyPI, Twine, and all metadata-consuming tools need to be updated. Full ecosystem readiness will extend beyond 2026. However, Jonathan predicted that once the system is available, adoption in the ML/scientific computing space will be rapid -- five packages within two weeks, doubling within a month, and at least 50 packages within a few months as PyTorch and JAX and their downstream dependencies activate variant mode. The team joked about the "Vasa's Force Law" of estimation: make an estimate, multiply by two, and change the unit (six months becomes one decade). The key insight is that the packages most in need are already part of the Wheel Next working group and are eager to adopt.

Interesting Quotes and Stories

"If you build a wheel for x86-64, you can only use CPU features that go back to about 2009. Any new hardware features introduced after 2009, things like SSE4, AVX2 -- you just cannot use because the installers don't know that you put that in the wheel." -- Ralf Gommers

"The difference between the 2009 hardware features and the 2019 or 2023 ones could be a factor of 10x, 20x in performance, depending on what you're doing." -- Ralf Gommers

"The PyTorch team has to try incredibly hard to stay under one gigabyte." -- Ralf Gommers

"You just say, hey, install Torch. And then in this variant-enabled build, uv would go look at Torch, it would see it has different variants for different CUDA versions, and here's how I inspect what CUDA version I should use on your machine. And then it would pick out the right version. Users shouldn't have to think about configuring it." -- Charlie Marsh

"If you look at their install pages, it's like a puzzle book. You just don't know how to install this stuff." -- Ralf Gommers, on packages like vLLM

"A problem well stated is a problem half solved." -- Jonathan Dekhtiar, on the PyPackaging Native guide

"I do think we have the fastest Python now, but we haven't actually published our rigorous benchmark methodology. So I won't stake my reputation on that claim yet." -- Charlie Marsh, on Python Build Standalone

"As we were prototyping this for a year, we ended up pretty much forking the entire ecosystem." -- Jonathan Dekhtiar

"The goal, of course, is to un-fork those things." -- Charlie Marsh

"Ideally the average user won't even have to think about this, right? Hopefully they just get it through uv or through pip." -- Charlie Marsh

Key Definitions and Terms

Wheel: The standard binary distribution format for Python packages (.whl files). The filename encodes the Python version, OS, and CPU architecture it was built for.
ABI (Application Binary Interface): Like an API, but for compiled binaries. ABI stability means compiled code remains compatible across versions without recompilation.
Platform tags: Metadata in wheel filenames that tell installers what OS, architecture, and Python version a wheel targets. Current tags cannot express CPU instruction sets or GPU requirements.
Wheel variants: The proposed extension to the wheel format (PEP 817/825) that allows multiple builds of the same package version, distinguished by hardware or software attributes like CUDA version or CPU instruction set.
SIMD (Single Instruction, Multiple Data): CPU instructions that perform the same operation on multiple data points simultaneously. Examples include SSE4, AVX2, and ARM Neon. Critical for scientific computing performance.
Fat binary (Fat Bin): A single binary that contains compiled code for multiple hardware targets, resulting in larger file sizes. NumPy and PyTorch use this approach today.
CUDA: NVIDIA's parallel computing platform and programming model for GPU programming. It enables massive parallelism across thousands of GPU threads.
Spack: A package manager designed for supercomputers that handles hardware-specific builds. Its archspec component inspired portions of the Wheel Next CPU variant design.
Python Build Standalone: A project (maintained by Astral) that produces relocatable CPython builds that can be downloaded, unzipped, and run without building from source.
Variant provider plugin: The proposed interface that lets installers dynamically detect hardware attributes (like GPU driver versions) and choose the correct wheel variant.
CFFI (C Foreign Function Interface): A mechanism that allows Python to call code written in any language that exposes a C-compatible interface, enabling Python's rich ecosystem of native extensions.

Learning Resources

Here are resources from Talk Python Training to go deeper on topics covered in this episode:

Python for Absolute Beginners: If you are new to Python entirely, this course will teach you the language fundamentals from the ground up, giving you the foundation to follow along with packaging and ecosystem discussions.
Modern Python Projects: Covers the full lifecycle of a Python project from dependency management to deployment, including virtual environments, packaging, and CI -- directly relevant to understanding why packaging standards matter.
Managing Python Dependencies: A focused course on pip, virtual environments, and choosing quality libraries. Essential background for understanding the packaging challenges discussed in this episode.

Overall Takeaway

Python's packaging system was designed for a simpler era -- one where "which OS and CPU architecture?" was enough to pick the right binary. But with 40-50% of Python developers now doing data science and scientific computing, and with GPU-accelerated workloads becoming the norm, that design has become a bottleneck that costs users real performance, real bandwidth, and real frustration. The Wheel Next initiative represents something rare in open source: a genuine cross-industry effort where competitors (NVIDIA, AMD, Intel, Google, Meta) are sitting at the same table, writing PEPs together, and funding shared infrastructure. The technical vision is elegant -- a generic, extensible variant system rather than an ever-growing list of tags -- and the working prototypes in uv prove it is feasible without sacrificing installer speed. It will take time for the PEPs to be accepted and for the ecosystem to update, but the most important packages are already at the table and eager to adopt. The future they are building is one where pip install torch just works, your CPU-bound NumPy code runs 10x faster because it uses your actual hardware, and you never again have to navigate a "puzzle book" install page. That is a future worth waiting for.

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 When you pip install a package with compiled code, the wheel you get is built for CPU features from 2009.

00:06 Want newer optimizations like AVX2? Your installer has no way to ask for them.

00:11 Want GPU support? You're on your own configuring special index URLs.

00:16 The result is fat binaries, nearly gigabyte-sized wheels, and install pages that read like puzzle books.

00:22 A coalition from NVIDIA, Astral, and Quansight has been working on WheelNext, a set of peps that let packages declare what hardware they need and let installers like uv pick the right build automatically.

00:34 Just uv pip install Torch and it'll work.

00:37 I sit down with Jonathan Decker from NVIDIA, Ralph Gommers from Quansight and the NumPy and SciPy teams, and Charlie Marsh, founder of Astral and creator of uv, to dig into it all.

00:47 This is Talk Python To Me, episode 544, recorded March 2nd, 2026.

00:53 Talk Python To Me, yeah, we ready to roll.

00:56 Upgrading the code, no fear of getting old.

00:59 They sink in the air, new frameworks in sight, geeky rap on deck.

01:03 Quark crew, it's time to unite.

01:05 We started in Pyramid, cruising old school lanes.

01:08 Had that stable base, yeah, sir.

01:09 Welcome to Talk Python To Me, the number one Python podcast for developers and data scientists.

01:14 This is your host, Michael Kennedy.

01:16 I'm a PSF fellow who's been coding for over 25 years.

01:20 Let's connect on social media.

01:21 You'll find me and Talk Python on Mastodon, Bluesky, and X.

01:25 The social links are all in your show notes.

01:28 You can find over 10 years of past episodes at talkpython.fm.

01:31 And if you want to be part of the show, you can join our recording live streams.

01:35 That's right.

01:36 We live stream the raw, uncut version of each episode on YouTube.

01:40 Just visit talkpython.fm/youtube to see the schedule of upcoming events.

01:44 Be sure to subscribe there and press the bell so you'll get notified anytime we're recording.

01:48 This episode is brought to you by Sentry.

01:51 You know Sentry for the error monitoring, but they now have logs too.

01:55 And with Sentry, your logs become way more usable, interleaving into your error reports to enhance debugging and understanding.

02:02 Get started today at talkpython.fm/sentry.

02:06 And it's brought to you by Temporal, durable workflows for Python.

02:10 Write your workflows as normal Python code and Temporal ensures they run reliably, even across crashes and restarts.

02:17 Get started at talkpython.fm/Temporal.

02:21 Hey, a quick announcement for everyone taking courses over at Talk Python Training.

02:25 We just rolled out course completion certificates.

02:28 I'm really excited about these.

02:29 When you finish a course, you can now generate a certificate automatically.

02:33 The best part is there's a one-click button to add it straight to LinkedIn on your profile as an official certificate.

02:40 Potential employers, current colleagues, they'll all see it right there on your profile.

02:43 Just head over to your account page at Talk Python Training, find a course you finished, click certificate, and the share to LinkedIn option is right there.

02:52 Zero friction.

02:53 And if your employer gives you credit for professional development or reimburses you for training costs, but require some sort of proof, you can also download a full certificate as a PDF.

03:04 Handy for that kind of thing.

03:05 I'd love to see a wave of Talk Python certificates showing up on LinkedIn.

03:09 Head over to Talk Python, click courses, go to your account page, and grab your certificates.

03:15 Jonathan, Ralph, and Charlie, welcome.

03:17 Welcome back, depending on which one of you are hearing this.

03:21 Welcome to the show, you all.

03:22 It's awesome to have you on Talk Python and me.

03:24 Thanks for having us.

03:25 Thanks for having us, Michael.

03:27 We're going to dive in deep to Python packaging and really look at how the needs of Python packaging have evolved.

03:36 And what you all, as well as a group of a bunch of other people, I see very long contributor lists on these peps.

03:42 So a lot of people involved in this project.

03:44 Really great.

03:45 So let's get into it.

03:47 Before we do, let's just do a quick round of intros for you all.

03:51 I guess go around clockwise.

03:53 Jonathan, you can go first.

03:54 I work at NVIDIA for like, I think, the better of eight years right now.

04:00 I did all kinds of different roles.

04:02 But very recently, I mean, over the last two something years, I move into improving our CUDA and Python offering, trying to find better ways to expose GPU programming, essentially, at the Python layer.

04:17 And I think for a little bit over a year, I've been working with Ralph and Charlie over multiple proposals to improve Python packaging.

04:26 Initiative call WheelNext.

04:28 And I think we'll talk a little bit more about this.

04:30 So excited to be on the show today.

04:33 Yeah.

04:33 Excited to have you.

04:34 You have really seen the roller coaster at NVIDIA, I'm sure.

04:38 Right?

04:38 Well, it's really exciting.

04:39 Yeah, it was like gaming and probably some data science and then all the changes and now just center of the universe.

04:47 So I'm sure it is exciting.

04:48 You know, the funny thing is I wanted to join NVIDIA since 15 years.

04:52 And I did a PhD to actually be able to join NVIDIA.

04:56 That is so awesome.

04:58 I love it.

04:58 I was amazed by the CUDA technology when I was in high school.

05:02 And I was like, ah, this is so incredible, the concept.

05:06 And I wanted to join.

05:07 So I'm happy I was able to make this happen.

05:10 You know, CUDA is going to be an important part of this discussion.

05:13 Not the only part, but it certainly is one of the forcing functions for the things happening here.

05:18 Give people the background on CUDA.

05:21 What is it?

05:22 How does it work?

05:22 Why is it so amazing?

05:24 Well, CUDA is essentially a programming language that allows you to program on GPUs, specifically NVIDIA GPUs, and has a different programming model than what you would usually do in C.

05:37 So because GPUs are fundamentally very different than CPUs, you have to program them with a different mindset.

05:44 Like, for example, the biggest important thing when you start with GPUs is to not think about a single thread executing the instruction.

05:52 But like, how can you massively parallel a task on like thousands of threads at a single?

05:58 And it takes a different perspective and mode of thinking to how can you imagine doing a task on so many threads at the same time?

06:08 We're not used to it as classic computer scientists.

06:13 If anything, multi-threading is something that we tend to shy away from because there's a lot of caveats.

06:20 But well, GPU programming is all about how can you have as many threads as possible.

06:25 Yeah.

06:26 It comes from graphics and videos where like this pixel is computed independently of that pixel.

06:33 And we've got, you know, 5K resolution.

06:36 So let's just break that up, right?

06:38 Yeah.

06:38 It's exactly the idea.

06:40 And now we have this reasonably new model that's called title programming that abstract it even more, which essentially instead of thinking about threads and blocks and grids, you think in terms of title.

06:52 So kind of a mini representation that you could have in mind.

06:56 And that thing can scale and adapt differently on different hardware.

07:00 So pretty cool.

07:01 But yeah, that is amazing.

07:03 People think that their CPU has a lot of cores.

07:06 It's got nothing on the graphics cards.

07:10 Well, yeah.

07:11 It's a different type of hardware.

07:14 Rich is different.

07:15 Absolutely.

07:16 Yeah.

07:16 Well, very cool.

07:16 Very cool.

07:17 And what a journey if you did all that work to get there.

07:20 I absolutely love it.

07:22 Ralph.

07:22 Welcome.

07:23 Hello.

07:23 Yeah.

07:24 Thanks, Michael.

07:24 Great to be here.

07:25 So about me.

07:26 I am a physicist by training.

07:29 I did a PhD in atomic and quantum physics.

07:32 Worked in the semiconductor industry in a while.

07:35 And I rolled into scientific computing.

07:38 Due to that, I started using Python in 2004.

07:41 And used the mailing list at that point.

07:43 Because there was, I mean, NumPy didn't exist yet.

07:46 There was no documentation for anything.

07:48 So you had to join a mailing list.

07:49 That's how I rolled into open source early on.

07:51 I became the release manager of NumPy and SciPy in 2010.

07:56 And yeah, I've been kind of doing that ever since.

07:59 As a volunteer for 10 years.

08:00 And then I got really too much.

08:02 So I made it my job.

08:03 I joined Quansight, which is a small consulting company.

08:06 Primarily around like data science, supplied AI, scientific computing.

08:10 And yeah, I'm now one of the two co-CEOs of Quansight.

08:14 Awesome.

08:15 Trying to basically, we just converted last year to a public benefit corporation.

08:19 Which is very much aligned with what, you know, most of our team wants to do.

08:23 Most of them are open source maintainers.

08:25 And yeah, we basically do consulting to allow ourselves to make impactful open source contributions.

08:32 Quansight is doing a ton in the data science space.

08:35 Scientific computing space, for sure.

08:37 I've had multiple rounds of Quansight folks on the show and things like that.

08:42 And very neat.

08:43 Yeah, it's a lot of fun and rewarding.

08:45 So yeah, glad to be here.

08:47 It's an interesting transition going from a science or something along those lines into programming, right?

08:53 And I got into it through working in my math research and so on.

08:58 And actually, this is just more fun.

09:00 I'm just going to do programming.

09:01 It's not exactly physics, but it's pretty similar, you know?

09:05 Yeah.

09:05 I mean, I've always liked both.

09:07 But I did experimental physics.

09:10 And there, you have much less control over what you end up producing.

09:15 You know, building and using lasers in the lab.

09:17 If one broke, maybe I had to send it off for repairs and wait a month, right?

09:21 And what do you do in the meantime?

09:22 You program.

09:23 So, you know, that's one of the nicer things about it.

09:26 And yeah, I gradually started with like, even before Python, there was some MATLAB.

09:31 And then, you know, you roll into open source.

09:33 And then, you know, mostly just Python a bit of C and kind of like go down from there.

09:37 And then you encounter packaging.

09:39 And it's one of those things that like only 5% of people like and the rest see it as a chore.

09:43 But yeah, when you like it, you just have to do more and more of it.

09:46 Yeah.

09:47 You're with your people now, I think, on this call.

09:49 That's for sure.

09:50 Hey, Charlie.

09:51 I mean, do we even need to give you an introduction?

09:53 We just say uv and then go on or?

09:56 No, I'll generally.

09:57 No, please do.

09:59 I'm just kidding.

10:00 But the reason I say that is uv has taken the world by storm, really.

10:06 And congratulations.

10:07 And yeah, tell people about yourself.

10:08 Thank you.

10:08 Yeah, yeah, of course.

10:09 So my name is Charlie.

10:11 I'm the founder and CEO of Astral.

10:13 I've been working on the company for, let's see, started the company in October 2022.

10:19 That's the easier way to do it.

10:20 So I've been working on this for a few years.

10:23 We mostly build open source.

10:25 So we've worked on a couple of different tools that have become quite popular in Python.

10:30 So we build Ruff, which is our linter and formatter.

10:32 ty, which is our type checker.

10:34 And then most relevant for this episode would be uv, which is our Python package and project manager.

10:40 So yeah, we spend all our time thinking about how to build tools that make it easier and to work with Python and how to make Python programming more productive.

10:49 A lot of that's about speed.

10:52 We try to build things that are really fast, but it's also about user experience and trying to sort of like take complexity out of the critical path for users.

11:00 So, you know, for example, we've definitely spent a lot of time thinking about how we can make it easier for people to install PyTorch, which is, you know, one of the examples that will come up, I'm sure, you know, over the course of the show.

11:12 And one of the motivating examples for the peps we've been working on.

11:15 So, yeah, that's why I'm here.

11:17 We've been collaborating with Jonathan, Ralph, and honestly, like a bunch of other people, too.

11:21 It's been a really big effort, and I'm sure we'll get into that.

11:23 But it's been cool to have this long running and very like wide ranging collaboration around trying to push Python packaging forward.

11:30 Well, like I said, congrats on all the stuff with Astral.

11:33 And we're going to talk a little bit about pyx, I think.

11:36 Maybe see if there's any.

11:37 Just to check in at the end of the show after we talk about some of these things, I think, if you're up for it.

11:42 Yeah, sounds good.

11:43 I mean, let's just start with what is the challenge.

11:45 You all have described this as the lowest common denominator packaging problem that we've got to deal with.

11:52 And the idea or the problem is different CPUs have specialized instructions, different graphics cards, all these different compute and platforms and so on might have specific instructions.

12:05 And they're optimized, right?

12:06 Like do this as vectors operations instead of on registers or whatever.

12:10 But maybe some other thing that it might run on doesn't support that, right?

12:15 I don't know, WebAssembly, whatever.

12:16 Yeah.

12:17 And so then how do you actually end up shipping something to Python people that takes advantages of the specializations that are there when they're there, but without breaking the other ones, right?

12:28 That's kind of the core problem.

12:29 Is that right?

12:30 Yeah.

12:30 I can take one little step back before.

12:32 Go ahead, Alan.

12:33 If you think about it like a wheel, when you take the Python package that we have everywhere, there is a few parts of the fine names that essentially allows you to know what it's been built for.

12:46 So inside this, you have, if it's a pure Python package, it's simple.

12:50 You might have a minimum Python version, but in most cases, it's pretty generic.

12:55 So that's not an issue.

12:56 When you start having compiled code inside the package, that's a different story because now we're talking about what kind of OS it was built for.

13:03 So Windows, macOS, Linux, different flavors.

13:08 We're talking about the type of CPU that it was built for.

13:11 So x86, ARM, PowerPC, potentially RISC-V, all these things.

13:17 Mobile.

13:18 And then finally, the Python API.

13:20 And in most cases, it means the minimum Python API that you need.

13:24 So, and for people, an ABI is essentially the same as an API, but for a binary.

13:31 So it's important when things are stable at the API level because it allows you to be future compatible.

13:38 What does API mean?

13:39 Jonathan, help us out.

13:40 What does ABI mean for those of us who don't know?

13:42 Application binary interface.

13:43 So it's the same as API, but instead of specifically for binaries.

13:48 And the problem that we collectively kind of try to get to is that, well, today, the compute space and the scientific computing space,

13:58 which if we take the latest JetBrains Python developer survey, is at least 40 to 50% of the Python developers are essentially doing data science or similar.

14:10 So it's a massive percentage of the community is doing, in some form, scientific computing to whatever extent you may want to think about it.

14:21 And, well, the problem is when we do these things, we try to do them fast because who likes to wait on the return of some Pandas operation or NumPy or PyTorch operation.

14:32 But to go fast, you need to use all the tricks in the books that you can get to essentially, you have to optimize the binary for a specific CPU,

14:42 for a specific GPU, or for a specific library that you want to use, like BLAS.

14:48 BLAS is a general concept, so which BLAS implementation is, or MPI.

14:53 And the problem is, well, we don't have the tags or markers to allow us to essentially flag this specific binary to be compatible with X, Y, and Z.

15:03 Right, so the wheel might say, this is for 3.14, it is for ARM CPUs, and so on, but it's not going to say,

15:14 and it supports this vectorization optimization on Intel chips, right?

15:18 I just said ARM, didn't I?

15:20 A very good example with ARM is that the default most people build with is actually a Raspberry Pi, ARM level.

15:30 Yeah, yeah, yeah.

15:31 And you can imagine that when you build for any type of desktop CPU, ARM, you have a little bit more complex CPUs and a little bit more advanced chips.

15:42 And it's a lot of performance that you leave on the table by not optimizing for a specific platform.

15:48 So obviously, in some cases, it doesn't really matter, but in other cases, it does really matter.

15:53 This portion of Talk Python To Me is brought to you by Sentry.

15:56 You know Sentry for their great error monitoring.

15:58 But let's talk about logs.

16:00 Logs are messy.

16:01 Trying to grep through them and line them up with traces and dashboards just to understand one issue isn't easy.

16:08 Did you know that Sentry has logs too?

16:10 And your logs just became way more usable.

16:13 Sentry's logs are trace-connected and structured, so you can follow the request flow and filter by what matters.

16:19 And because Sentry surfaces the context right where you're debugging, the trace, relevant logs, the error, and even the session replay all land in one timeline.

16:28 No timestamp matching, no tool hopping.

16:31 From front end to mobile to back end, whatever you're debugging, Sentry gives you the context you need so you can fix the problem and move on.

16:37 More than 4.5 million developers use Sentry, including teams at Anthropic and Disney+.

16:42 Get started with Sentry logs and error monitoring today at talkpython.fm/sentry.

16:48 Be sure to use our code, talkpython26.

16:50 The link is in your podcast player's show notes.

16:53 Thank you to Sentry for supporting the show.

16:55 I'll give a very concrete example.

16:57 Intel x86-64 is kind of the most common CPU that most people will have at home.

17:04 If you build a wheel for that, you can only use CPU features, performance CPU features that go back to about 2009.

17:13 Any new hardware features that were introduced after 2009, things like SSE4, AVX2, later versions of that, you just cannot use because the installers don't know that you put that in the wheel.

17:27 And hence, they will also install it on computers that don't have those instructions, right?

17:32 And then you just get like very ugly crashes.

17:34 Hence, what we all do is we ship wheels, binaries that are only compatible with 2009.

17:40 And the difference between the 2009 hardware features and, you know, the 2019 or 2023 one could be a factor of 10x, 20x in performance, depending on what you're doing.

17:51 10x to 20x?

17:53 Oh, yeah.

17:53 For, you know, especially when you work with scientific data and SIMD instructions.

17:58 Yeah, you can get massive performance increases.

18:01 If you heard of vectorization, this is a huge deal.

18:05 Yeah.

18:05 I mean, I guess the way I think about it from our perspective of building, like, because these problems, like one of the things that's very hard about solving them, and it has required,

18:12 like us to be so collaborative across the industry, is that it touches, like, basically every piece of the Python packaging stack.

18:21 Like, it impacts how you build things, how the registries work, like what they support, how installers, like, choose what to install.

18:30 And so, like, for us, it's like, you know, there's the superpower of Python, I think, in some ways.

18:36 Sorry, I think the superpower of Python in some ways is, like, you can build and distribute all this software that's built for, you know, that uses native code.

18:45 Like, you can take native code, and you can distribute it out to users, and they can run it just like it's any other piece of Python code.

18:51 And in the spec, we have these things like, okay, you can build a wheel that targets Windows or Linux or macOS, and it can target, like, x86 or ARM or whatever else.

19:03 And those are all captured in the spec.

19:04 And so, for us, like, building uv, we know how to detect those things, how to figure out, like, which wheel to install based on what the user's machine is running.

19:13 But there's all this other stuff that's not captured by any of those standards, like the instruction set or even, like, the supported CUDA version.

19:20 Like, all these things are not captured in that wheel file, and installers don't know how to detect them.

19:25 They don't know how to figure out, like, okay, which PyTorch build should I use based on the CUDA version on the user's machine?

19:30 Like, all that stuff is lost.

19:31 And that's kind of the gap that we're trying to bridge.

19:34 And part of the philosophy is also, so right now, Python packaging exposed what is called platform tags.

19:41 So, essentially, a sort of, like, mini tag that comes with a specific definition that installers know how to resolve.

19:48 And what we're trying to evolve is to end up creating 200 more today and 200 more in two years and 200 more, again, in four years.

19:57 So, we try to come up with a generic system that will allow you to essentially include arbitrary definition that then resolvers and package managers can then understand by some sort of mechanism.

20:10 And resolve, but not create a sort of blessed list of things that you constantly have to update because it's a lot of maintenance.

20:19 Yeah, that's how we got into the situation now, right?

20:21 Because there's one for the version, there's one for the architecture of the CPU, but then there's not a spot for the other stuff.

20:27 So, the overall idea is to say almost just a metadata section in there and things can read it or ignore it as they see fit.

20:35 Yeah, that's exactly a concept.

20:37 Conceptually, yeah.

20:38 A little bit.

20:39 Yeah, yeah.

20:40 I mean, it's like, I guess the question is, like, okay, if we have this, like, huge space of things that we might possibly want to detect and condition installs on, like, okay, anytime someone publishes a wheel for Python, they should now tell us, like, what CUDA version is it built for?

20:55 Or, like, if any, what, like, CPU instruction sets does it support?

20:59 Like, blah, blah, blah.

21:00 Like, where would we put all that stuff, right?

21:01 Becomes the question.

21:02 It's like, what, are we just going to keep expanding, like, the platform tag and everything else?

21:06 And that's, like, the problem that we're trying to solve in kind of a generic way.

21:09 Yeah, you can end up with a file name that's 4,000 characters wide or something.

21:13 They can already get pretty long, by the way.

21:14 But, yeah, that's, if you have any changes, we have to work around that in uv sometimes, file name length limits.

21:21 But, yeah.

21:22 It's actually a very famous package that used, I think, 200 first digits of pi as the version number.

21:28 Oh, my gosh.

21:31 It's a pretty good joke.

21:32 I didn't know about it.

21:33 There's somebody on discuss.python.org that posted the link.

21:36 And I was like, but that's hilarious.

21:38 That is wild.

21:40 So, before we dive into what you all are proposing, let's maybe talk about how just a couple of packages or libraries solve this problem now in maybe different directions.

21:50 So, Ralph, what about NumPy, right?

21:52 I mean, you guys talked about vectorization and stuff.

21:56 Yeah.

21:57 That's so in line with NumPy, right?

21:59 Is NumPy, like, and pandas, that's the way, you know?

22:02 Yes.

22:02 NumPy, yes.

22:03 Pandas, no.

22:04 So, NumPy does contain SIMD instructions and, you know, because it's incredibly useful for performance.

22:14 You know, NumPy has all large arrays and basic instructions on them that, like, have direct hardware implementations typically.

22:21 But the way it's done is incredibly complex because you need to end up with a wheel that works on every type of CPU, right?

22:29 We didn't, you know, I'll stay with x86, but the same happens on the other platforms, right?

22:33 You know, it needs to run on a 2010 CPU and it needs to run better on a 2024 CPU.

22:39 So, what we do in NumPy is we have a system that basically allows you to either parameterize a source file that, you know, and then rebuild it multiple times, you know, four different particular CPU architectures.

22:54 So, like, you know, like a Haswell family and then a Skylake family and so on.

22:59 And then we basically merge that together in a single Python extension module.

23:03 And then at runtime, we have our own code to detect the CPU and basically then some, like, dispatch shim layer that kind of fishes out the right, you know, family from the extension module.

23:16 So, yeah, you put up the diagram there.

23:19 It's pretty complicated.

23:21 And I'd say there I've been collaborating with some of the, you know, world experts on this.

23:27 We had like an in the end, this was only successful because we built a generic architecture that other experts per, you know, CPU architecture could come and contribute to.

23:38 So, we now have a specific team of like four people that help maintain the architecture.

23:44 But then like, you know, Intel for years paid one of their engineers to optimize specifically the x86 code path.

23:52 And then ARM has a NumPy maintainer who, you know, got commit writes a few years ago.

23:57 And he's the final authority on all the ARM instructions that are in there.

24:01 So, that whole complicated thing is now shipped and it's extremely good for performance.

24:06 But you can see how this is not a scalable process to do in many packages, right?

24:10 Plus, you know, if you compile everything five times, you get a binary that's, you know, it's not five times bigger, but it's a lot bigger.

24:17 So, it's not great for users as well.

24:19 Yeah, actually the nickname for these things are called Fatbin.

24:22 So, you have the idea for why they are called that way because they tend to be very heavy to download.

24:28 Yeah, yeah.

24:29 Instead of wheels, you got big wheels.

24:31 Yep.

24:31 So, what happens if all these changes get adopted and it doesn't need to be compiled into one giant binary?

24:38 Okay.

24:38 Are all these maintainers still working?

24:40 They just don't have to deal with trying to boot it all into one thing?

24:44 They might still have to do, yes.

24:46 I think essentially you're correct.

24:48 You still need to write the actual code that uses the SIMD instructions.

24:52 But then you can just produce a wheel that says like, okay, it works on this specific CPU architecture and just ignore this code if I'm building for another architecture.

25:01 And all the, you know, detecting the CPU at runtime and the dynamic dispatch features you all don't need.

25:06 Will it make the code faster?

25:08 It will, well.

25:09 Like will you have a better cache hits?

25:11 Will there be smaller stuff in memory?

25:12 You know, that kind of stuff.

25:13 I don't think it will make the NumPy code much faster.

25:17 It will, you know, it will make a huge difference for all the other packages that don't have this amount of complexity today.

25:24 So like SciPy, scikit-learn, Pandas, Pillow, like none of these packages actually use SIMD code.

25:31 And for SciPy, it's the easiest for me to talk about because I'm also a SciPy maintainer.

25:35 We actually have a lot of code that, you know, got vendored in from somehow, like Fourier transforms, for example.

25:41 They benefit a lot as well.

25:42 We have AVX2 and ARM Neon implementations, but we just don't build them and don't ship that as wheels because we have no way of doing that.

25:51 As soon as we have, you know, wheel variants, we can say, okay, let's ship two sets of wheels.

25:56 I mean, that's more CI jobs to build more wheels.

25:59 But, you know, when it's worth it, you know, you can make that trade-off, right?

26:02 Like we already have the code.

26:03 We just have to change a build option, produce a different wheel, and ship it.

26:07 So do you just set up something like a hash if def sort of thing for like if defs this capability?

26:15 Else you put in the generic code?

26:18 Exactly.

26:19 The, yeah, the C code is basically just a bunch of if defs.

26:22 And, you know, if you only, you know, for maintainability reasons, you only add more if defs if, you know, it's really much faster.

26:29 Like you are going to do it for 10 or 20% faster, but if it's 2x faster, well, why not have an extra else bridge?

26:36 Yeah, absolutely.

26:37 Charlie, does Rust have a hash if def equivalent?

26:40 It must, right?

26:40 Yeah, you can do.

26:42 It has directives like that.

26:44 Yeah, but you guys don't really need to worry about using this for yourself.

26:47 This is more for the things that you service providing to everyone, right?

26:52 Yeah.

26:52 Yeah, this is mostly, this wouldn't have a huge impact on uv or, I mean, it could have some small impact.

27:00 But I think largely this is about, yeah, how can we make it easier for users to consume this stuff?

27:04 And I mean, the NumPy, like this is a good example of how it affects like build and distribution.

27:10 Because, yes, they still have to write like architecture specific code if they want to get these optimizations.

27:15 But what we'll be doing with these proposals is making it much easier for them to ship separate builds that are like dedicated for each of those different variants.

27:24 So like the end user, you know, will get access to it.

27:28 But in this case, it's like the bottleneck is, or part of the bottleneck is like all the complexity it puts on the maintainers and the people publishing.

27:36 How much do you think it would impact the performance to ship Python standalone with different CPU extension?

27:42 That is a good question, Jonathan.

27:44 So we'd actually like to do, I don't know that I have a great answer to that.

27:49 I mean, like a good quantitative answer to it.

27:52 I think we are very interested in doing stuff like that.

27:55 We've also considered, for example, shipping a build.

27:57 Like we ship with a relatively old like glibc minimum.

28:01 We've considered shipping a build, a variant, not in the sense of the, sorry, a different build.

28:07 Let me just put it that way.

28:08 That uses a more modern glibc version, for example.

28:11 We do run into other problems with that.

28:13 Like our build matrix is really big.

28:15 We have to split it across multiple GitHub actions now.

28:18 And so like we need to, we just have like a lot of builds.

28:20 So we'd probably, we're worried about like doubling the size of the build matrix, for example.

28:24 But that's a separate problem.

28:26 But yes, it could actually, it could actually be helpful there.

28:28 Although we don't ship those as wheels today.

28:29 Yeah, that's awesome.

28:31 In a very interesting angle to think about how much leverage, I mean, this probably does, this is probably something you've thought about.

28:37 But how much leverage you and your team actually have on Python performance by how you control Python build standalone.

28:44 This portion of Talk Python To Me is sponsored by Temporal.

28:48 Ever since I had Mason Egger on the podcast for episode 515, I've been fascinated with durable workflows in Python.

28:56 That's why I'm thrilled that Temporal has decided to become a podcast sponsor since that episode.

29:00 If you've built background jobs or multi-step workflows, you know how messy things get with retries, timeouts, partial failures, and keeping state consistent.

29:10 I'm sure many of you have written brutal code to keep the workflow moving and to track when you run into problems.

29:15 But it's trickier than that.

29:16 What if you have a long-running workflow and you need to redeploy the app or restart the server while it's running?

29:22 This is where Temporal's open source framework is a game changer.

29:25 You write workflows as normal Python code and Temporal ensures that they execute reliably, even across crashes, restarts, or long-running processes, while handling retries, states, and orchestrations for you so you don't have to build and maintain that logic yourself.

29:41 You may be familiar with writing asynchronous code using the async and await keywords in Python.

29:46 Temporal's brilliant programming model leverages the exact same programming model that you are familiar with, but uses it for durability, not just concurrency.

29:55 Imagine writing awaitworkflow.sleep, time delta, 30 days.

30:00 Yes, seriously, sleep for 30 days.

30:02 Restart the server, deploy new versions of the app.

30:04 That's it.

30:05 Temporal takes care of the rest.

30:07 Temporal is used by teams at Netflix, Snap, and NVIDIA for critical production systems.

30:12 Get started with the open source Python SDK today.

30:15 Learn more at talkpython.fm/Temporal.

30:17 The link is in your podcast player's show notes.

30:19 Thank you to Temporal for supporting the show.

30:22 Maybe just tell people, what is the relevance there?

30:26 Like, why?

30:27 What is Python Build Standalone and how does this even apply to what we're talking about?

30:30 Oh, yeah, sure.

30:31 I use it every day.

30:32 I love it.

30:33 A lot of people use it and don't even know.

30:34 I mean, it's probably the least, it's the least like public or like user, direct user facing thing that we do.

30:41 But we took over maintenance of a project called Python Build Standalone probably like a year ago, maybe a little more.

30:49 And that project, the basic idea is like typically when you build CPython, you know, at least like on Linux, for example, a bunch of absolute paths get embedded into the binary, which makes it hard to build like reproducible and relocatable CPythons.

31:05 Like it's hard for someone to build a CPython that you can then download and run on your machine.

31:08 You typically need to build it on your own machine.

31:12 So what this project does is it's sort of like a fork of the CPython build system.

31:17 It's like the CPython build system with a bunch of patches and other changes applied on top.

31:21 And it makes it so that we can build Pythons that you can just download, unzip and run.

31:27 So when you install Python with uv, and these are also used in like Bazel and in a bunch of other tools, we don't actually like build Python from source.

31:35 We actually download, unzip and run Python, which just makes it much easier.

31:39 It means it's faster.

31:41 You don't have to have like the build tool chain on your machine.

31:45 You don't run into problems around like failing to build it or anything like that.

31:48 But the other thing that's been cool about that project, at least recently, is we've been very focused on performance.

31:53 So on actually just trying to make sure that we're distributing, like our goal is to be like the fastest Python distribution.

32:00 Like even without changing CPython source code, just changing how we build it and various things that we can tweak there.

32:07 And so we've been working on a bunch of benchmarks.

32:08 I do think we have the fastest Python now, but we haven't actually published our rigorous benchmark methodology.

32:14 So I won't stake my reputation on that claim yet, but we've been very focused on it.

32:19 And it's been a cool point of leverage because like we can just, yeah, if we can make Python, you know, if we can put out a Python distribution that's like 10 or 15% faster, you know, just by changing how we build it.

32:28 Yeah, it's a big lever for impact.

32:30 Yeah, it's a huge lever.

32:31 And I hadn't really thought about it being a lever until Jonathan brought it up.

32:35 But for example, it's not directly impacted by this because we don't ship it, I guess, for the reason that we don't ship it as a wheel.

32:40 Although someday we potentially could.

32:42 Right now it's just, they're just the files that uv knows how to install.

32:45 But it's the same logic at the core.

32:47 Once you start tweaking the packaging of Python packages, the next part you want to tweak is your Python install.

32:55 Well, for example, all of my stuff that runs in on the servers, it's all in Docker and it has a base Docker image.

33:03 And one of the very first lines is, you know, install the, use curl plus the shell to install uv.

33:09 The next line is uv, V, E, and V.

33:12 And that, that installs Python from Python build standalone.

33:16 And then whatever, you need to make an actual app out of that afterwards.

33:19 Right.

33:20 And so how many people are doing that?

33:22 I, it seems like a huge portion of the world has adopted uv for sort of bootstrapping Python instead of the other way.

33:29 So that's, that's why it's such a big lever, right?

33:31 Yep.

33:32 Yeah, exactly.

33:33 All right.

33:33 As a way to sort of get into the peps, Charlie, you mentioned variants.

33:39 You're like, wait, wait, wait, not that variant.

33:42 What variant are we talking about?

33:43 That's not that variant.

33:45 What is that variant?

33:46 I guess that we're not talking about in uv or Python build standalone.

33:49 Who wants to take that?

33:49 Ralph, do you want to take that?

33:50 I'm not actually sure what the question is here.

33:53 I think you were targeted for the question.

33:55 Yeah, yeah, yeah.

33:58 That's fine.

33:59 I mean, like the, so we use, so the peps revolves around this concept of wheel variants.

34:03 And the idea is you can have, I'll keep using the word variants.

34:09 You can have different variants, different builds, you know, of a wheel that are intended to be installed based on properties that are known or detected on the machine.

34:20 So, for example, that could be like, okay, what NVIDIA drivers do you have on your machine?

34:29 Like, what are the versions of those drivers?

34:30 Because that then implies things about what versions of the CUDA runtime you can use.

34:34 And so when someone publishes a wheel, maybe that wheel, you know, leverages CUDA and needs to be built against CUDA and needs to be built, you know, in a way that leverages CUDA.

34:43 And so they might publish different variants, effectively just different, you know, slightly different versions of, versions is wrong, different variants, slightly different flavors of that package that are all built against different, you know, different CUDA versions.

34:58 And so we would call those different, you know, different variants.

35:01 It's a, it's a, you need to correct me.

35:03 The terminology across what I understand the packaging space, even outside of Python, if you type variants in general, this is, we try to reuse the terminology that ends up being

35:14 pretty widely adopted in the packaging ecosystem, not Python packaging, the packaging at large.

35:21 This is the, variants is the name that you'll find around for this kind of concept.

35:26 You know, related to that, like, especially in the astral flavor these days, but also in many other areas, I feel like crates and rust, what they've done with their packaging system has kind of influenced some of the things we're adopting in the Python world.

35:41 Has anything from the rust world influenced the, these peps that we're about to talk about?

35:46 Well, crates are source distribution now, mostly.

35:49 Yeah.

35:50 Well, in this case, we're talking about actually binary distribution.

35:54 Yeah.

35:54 So not really.

35:55 Okay.

35:56 But in a sense.

35:56 That's actually interesting, right?

35:58 Yes.

35:58 Yeah.

35:59 Because a lot of the best packaging systems, you know, whether it's, it's rust or, you know, Nix, they start from source, right?

36:07 And they know exactly what's, you know, in the box.

36:09 And then binaries are kind of like an optimization, right?

36:12 It's like, you have a thing that you know exactly what is the binary and you can check like, oh, I don't have to build this thing from source.

36:19 I can grab a binary somewhere.

36:20 Right.

36:20 The packaging is absolutely not like that.

36:23 Like if you build a wheel and you have an sdist, I mean, you have no idea if they're the same thing.

36:28 If you, you know, you cannot rebuild the wheel from the sdist unless, you know, you use very, very well predefined constraints.

36:36 Yeah.

36:36 I hadn't really thought about that either, but that is an interesting juxtaposition.

36:40 Like the binary stuff that is all binary is shipping as source, but the interpreted stuff is shipping as binary.

36:46 And I think part of the reason, or maybe the main reason is if we're talking about binary stuff for Rust, well, it's all Rust that's compiled, but for Python, it's this mix, this

36:57 crazy mix of all these different libraries that are not, none of them are Python, but they're all binary in the end.

37:03 And so you've got to get around the fact like, well, I don't have a Fortran and a Haskell compiler, so I can't run this project, you know?

37:10 There's something quite amazing to Python in general, which is called the CFFI.

37:15 So the C foreign function interface, which essentially allows you to build any sort of application you want in whatever language.

37:23 As long as you're compatible with CFFI standard, you can call it from Python and it's incredible and amazingly useful.

37:33 But to come back on what Ralph was saying, a lot of the design actually from WeR variant has been inspired by a system that is called Spack that was designed for supercomputers.

37:47 And we use this, especially around the design of CPU variants to kind of get a lot of inspiration around a package called Spack that is just, from my perspective, pure brilliance in some of design.

38:02 Just my words, but in my opinion, but I really think they got the thing right.

38:08 It's just beautifully designed.

38:10 Everything is static and JSON-fired and it's extremely easy to scale and maintain.

38:15 But yes, if you take all the kind of system designed to support the most specific deployment scenarios like SPAC, like Nix, or even in some cases, cargo, well, they mostly ship sources

38:29 to go around this variant problem because that allows you to control the entire build chain essentially.

38:35 And in some cases, maybe Ralph can talk about it, but Conda Forge also kind of take an approach that is similar to Nix to kind of go around these issues a little bit.

38:44 Maybe Ralph, if you want to talk a little bit about that.

38:47 Not quite because Conda and Conda Forge don't do source distributions at all.

38:52 They just take a release and they build binaries.

38:54 And if there are no binaries, you can't install it.

38:58 But yeah, I would say that's a good point, right?

39:00 We have people that worked on all these systems.

39:02 Like one of Jonathan's colleagues at NVIDIA, Mike Sarahan, used to work on Conda.

39:07 I contribute to Conda Forge as well.

39:09 And so we have some ideas that originally came from Conda, some that came from SPAC.

39:14 And the end result is nothing like, not exactly like any of those systems, but it takes some of the best aspects of them to enhance Python packaging.

39:22 Not reinventing the wheel.

39:24 I mean, maybe, but not too much.

39:27 Yeah, not too much.

39:29 But it's kind of, it's cool because I think, like, I feel like a lot of this work really got kicked off.

39:35 We did an in-person summit.

39:37 And I honestly can't remember when that was because my mind is such a blurb.

39:41 March 2025.

39:42 Thank you.

39:43 Okay, so it was about a year ago.

39:44 And there's a bunch of notes about this.

39:46 And we had people from probably like, I don't know, I'd have to guess 20 different companies, maybe more, all in person for a day, just talking about these problems.

39:56 And a bunch of people presented on their own open source projects and how they intersect with, like, we had people from PyTorch, people from the JAX team, just talking about like, how, what their concerns are, like, what's working well for them, what's not.

40:07 And so, you know, similarly to how we've, I think a lot of the design has really been influenced by like, what are other designs?

40:14 What's the prior art and like, what's working well?

40:17 You know, a lot of it was also informed by like, just talking to a bunch of people across the industry and understanding like, what their concerns are.

40:24 And so, at least from my perspective, having not, honestly, by calendar time, I have not been involved in Python that long.

40:31 But it's been like, definitely the most like cross company, cross project, cross organization effort I've been involved in by a lot.

40:38 We try to replicate a model that I really like in the Python community that was faster cpython.

40:46 We try to philosophically create the packaging child of faster cpython.

40:51 But, and that's how we created WheelNext.

40:54 It was all the amazing work that the faster cpython community did on the cpython side, and kind of creating the same synergy, but around Python packaging.

41:05 And that's why it was.

41:06 I would almost say it's even, you know, quite a bit more diverse.

41:11 At least my understanding is faster cpython is primarily like funded and created by Microsoft, and it kind of turned into a community thing.

41:18 But like, all the money came from Microsoft, I think.

41:21 I think the majority of the people were working in a team inside Microsoft, at least.

41:25 And here, we've got NVIDIA, Meta, the PyTorch folks at Meta.

41:30 We got some contributions from AMD and Intel, and then Astral, Quansight.

41:35 Large amount of the time that we've been able to spend at Quansight came from funding from Red Hat, who came with their own problem sets.

41:43 And, you know, so, and that's just the most prominent contributors.

41:47 So there's like at least 10 companies that started investing in this, because it solves so many problems.

41:53 Yeah, that's really encouraging as well.

41:53 On the left side, you'll see a section called Who We Are.

41:57 Yeah, so I pulled up this project, Wheel Next.

42:00 And, you know, Ralph, this is yours?

42:02 Yeah, who are we?

42:03 And the name of also the open source project that contributed time and expertise.

42:09 Yeah, AMD, Anaconda, Aprio, Astral, Google, Huawei, Intel, Lap, Lab, Meta, NVIDIA, Preferred Networks, Quansight, and Red Hat.

42:18 That's a bit of a group working on this.

42:21 And you can see just above all the different open source projects that different OSS and lead maintainers have contributed time and energy to kind of try to make this move forward.

42:32 So it is quite a few people.

42:35 Yeah, yeah.

42:35 Most notably, maybe CuPy and PyTorch, possibly.

42:38 I mean, they're all...

42:39 Maybe one company that is not too well known, undeservingly, because they should, which is Probabl at the bottom that you mentioned, which is essentially the support company behind scikit-learn.

42:52 So if people don't know it, Probable is essentially representing Scikit.

43:00 Yeah, so this is wheelnext.dev.

43:03 This is basically the website for the group, the working group, something like that.

43:08 Yep.

43:09 We try to leave our notes, our thinking, our drafts.

43:13 One aspect that I really like on the work that we did is that it kind of felt like a startup.

43:18 We were making a mock-up and iterating very fast and getting feedback.

43:23 And this, I don't like this.

43:24 I don't like this.

43:25 I don't like change it.

43:27 I worked really closely with two people, one from Quansight, one from Astral, Constantine and Michel.

43:34 And we did so many hours of work.

43:37 So many different prototypes, iterating, exposing the work to people, collecting feedback, adjusting, and repeating the cycle so many times until we finally got to something that we thought was reasonable.

43:52 And that's where we started to write the peps.

43:54 But that process took us a year.

43:57 All right.

43:58 Well, we should probably jump into the peps.

44:00 And I'll tell you what, you all have quite the authorship attribution here.

44:04 But also, I believe, correct me if I'm wrong, that this PEP is notable in that it's the longest PEP ever.

44:10 Something like that, right?

44:12 Yeah.

44:12 I don't know if it's an achievement to be proud of.

44:17 It's the most powerful PEP ever.

44:19 Yes.

44:20 No, no.

44:20 It's a super pep.

44:21 So much so that we're talking about PEP 817 wheel variants, which is the variant thing that we actually are talking about, not the other variants, beyond platform tags.

44:31 But then so much so that it actually got kicked to the curb for like, well, what is the minimal viable PEP of this pep?

44:38 So we can take it in steps.

44:40 And Jonathan, you just told me really good news that pep, so you spun off this other pep, PEP 825 wheel variants package format, which is smaller, which still has a significant authorship.

44:52 But that this was just, it says draft, but is that true?

44:56 Yeah.

44:57 So peps, maybe Ralph, you want to discuss a little about what's the process for PEP that I think that's important.

45:04 Yeah.

45:04 So when you submit a pep, it first, you know, submit up on GitHub.

45:08 And then there's a group of folks called the PEP editors who basically just edit, you know, they review it for clarity, you know, language, consistency with other peps and so on.

45:17 So they don't really look at the content of what you're proposing.

45:21 So it's just, as long as it's clear, they're happy, you merge it in.

45:24 But because the first PEP was already so long, that process took like over a month already.

45:29 But at that point, it's merged as draft.

45:31 And then you go to the Python packaging discourse where you say, okay, here's our pep.

45:38 You know, now please let's start the actual community review.

45:41 And then basically anybody with an opinion can weigh in.

45:44 And it's just, it's a forum.

45:47 They're not, it's not even a threaded forum.

45:49 So it's just one long thread of comments, which tends to make it like a little challenging.

45:54 You know, the more complex the topic gets, the harder it is to make sense of this conversation.

45:59 It's really hard to have a threaded multi-component conversation.

46:04 It is.

46:04 Exactly.

46:05 So that's one of the reasons it's now split into smaller parts.

46:09 So you can at least have separate threads about different topics, right?

46:12 So, and because especially not all of the parts of the design apply to everybody.

46:17 When we're talking about installers, we want to hear primarily from the authors of uv and pip, Poetry, Hatch, PDM.

46:26 But if we're talking about how do you build a wheel, well, we have to talk primarily to setup tools, scikit-build-core, Meson Python, the build backends.

46:36 And, you know, the index server the same, right?

46:39 Do you want to know that the PyPI maintainers are happy?

46:42 So that's why, you know, organizing this review and chopping it up into a complex part, it's still going to be really hard to get the right amount of feedback.

46:50 But we now have like the first PR, you know, the first merge path in draft status.

46:56 So it's going to only be accepted once the whole community review process is done.

47:02 And probably what will happen is it's going to be provisionally accepted only because we know there's like three more paths coming for the other parts.

47:10 And eventually, like the, you know, you want all four to be, you know, working and accepted.

47:15 Like, you know, we now have prototypes, but, you know, we want the prototypes for the final design and have like, you know, the tool author say like, yeah, this works for us before you really go from provisional to actually accept.

47:26 Amazing.

47:27 So this is part of what I get out when I said at the beginning that this touches like every part of the packaging stack.

47:33 There's just like, it's very hard to break it up into, I mean, that's what we're trying to do in some sense.

47:39 But like, it's from the start, it's been hard.

47:41 It's hard to, there aren't necessarily super great cut points because it does affect how you build packages, how you publish them, like how they get hosted and served from the registry, how installers like look at them and understand them.

47:53 All of those things, like marker syntax, all of that stuff gets impacted in different ways.

47:59 It's very funny.

47:59 We're prototyping this for a year.

48:03 We ended up pretty much forking the entire ecosystem.

48:06 pip got fork, uv got forks, warehouse got fork, packaging got fork, like absolutely every package in the ecosystem, but it didn't being forked because we needed to test our implementation.

48:21 And we needed to verify.

48:21 The goal, of course, is to unfork those things.

48:24 Yes.

48:24 Like over time.

48:25 It's a re-merge pack, but we needed to have a playground to be able to experiment and see how the concept that we were developing was functioning in pip.

48:36 And then in packaging, but then also in setup tools.

48:39 And then in scikit-build-core.

48:42 And then in meson-python.

48:43 And it just keeps spreading essentially to every single corner of the packaging, installation, and distribution aspect of Python.

48:51 So that was pretty funny.

48:53 Yeah.

48:53 What ecosystem you got in?

48:56 I think you have fork in uv, or I guess technically it's just a branch that Constantine on our team on here on the PEP has been, who's been super involved.

49:06 Oh, thanks.

49:07 Who's been super involved, you know, throughout and done a ton of work on basically implementing the standard in uv.

49:12 So we have like a working implementation that we've used to, yeah, you can actually install it from, you know, we basically distribute it to a slightly different URL.

49:22 So you can install it and test it.

49:24 But yeah, that's been, that fork has evolved a lot, or that branch has evolved a lot.

49:29 And it's been a lot of work to, I mean, it's been incredibly helpful for the design process for us to understand like what's hard, what's easy.

49:35 And then I also think it's important for PEPs just to have like working implementations too.

49:40 And I mean, a lot of people agree that's not an awful point, but that's been one of the goals too, is to show what it's like in practice and that it actually works.

49:47 So people want to play around with this.

49:48 An easy way might be to try to use this fork.

49:52 We put a lot of work to actually make it, go ahead and try it.

49:56 Because I think it's, I personally have a lot of like admiration for the work done in free threading Python, especially to the PEP.

50:05 And I think Sam Gross, who is the main author, managed to make significant amount of progress as he was coming up with prototypes that are, it's not just my word.

50:15 Let me show it to you.

50:16 It works.

50:18 And there was so much skepticism around that idea of free threading Python.

50:22 He had to, had to show, not tell.

50:24 But I think if we didn't do the work similarly on variant enabled wheels, people would have told us, oh, well, resolution is too slow.

50:34 It's going to slow down installers too much.

50:36 And Astral is probably one of the installers that care the most about speed.

50:41 So we need to both convince us, but also Charlie's and his team to be, hey, it's not going to slow down anything.

50:47 Yeah.

50:48 And we had plenty of feedback on that front too.

50:49 Well, during the design, we were like, no, this is going to be too slow.

50:52 Or like, this is like a better way to do it, et cetera.

50:54 But, but I like, I mean, I like this little snippet.

50:57 Cause like, this is basically like, if you haven't felt this pain, it might not be meaningful to you.

51:02 But if you've like worked with PyTorch, like this is kind of like, this is what we want to enable.

51:06 Right.

51:06 Is like, you don't, you don't have to like configure a specific index URL that like captures the CUDA variant or anything like that.

51:12 Like you just say, hey, install Torch.

51:15 And then in this variant enabled build, uv would, it would go look at Torch.

51:19 It would see, okay, Torch, you know, it has different variants for different CUDA versions.

51:25 And here's how I inspect, you know, what CUDA version I should use on your machine.

51:29 And then it would pick out the right version based on what's supported by the GPU that's running.

51:32 Like that should all happen and users shouldn't have to think about configuring it effectively is like what we were, what we have been working towards.

51:39 And in the future, the first line doesn't exist because right now the first line is just here to install this variant enabled.

51:44 Yeah.

51:44 That just installs the fork.

51:46 Yeah.

51:46 And for people listening and not watching what they mean by this, there's three lines here to say how to use this.

51:51 It says curl.

51:51 I'm sorry.

51:52 Basically.

51:52 Yeah, no worries.

51:53 It's the install statement for uv, which is typical, except for that it overrides the.

51:59 The download URL.

52:00 The download URL.

52:01 It's a different URL, which is wheelnext.astral.sh.

52:05 We serve, we distribute a separate variant enabled experimental quote unquote prototype build.

52:10 Right.

52:10 And then you just create a virtual environment, uv, V and V, and then you just uv pip install like normal, but it handles this.

52:16 And, you know, Charlie, we spoke, I think on the pyx episode about just how large some of these things are like PyTorch and others that are compiled there.

52:26 You can't just come download everything, all the variations into one wheel.

52:30 I mean, I guess you could, but it'd be crazy, right?

52:32 That's actually a big benefit, right?

52:34 Like right now you go to PyPI, you download the PyTorch wheel.

52:37 It'll be about 900 megabytes.

52:40 You could make it small.

52:41 You know, part of the reason it's so large is, again, these bad binaries, right?

52:44 Like the NumPy ones are like a few megabytes.

52:47 The PyTorch ones have a bunch of CUDA inside, like for five or six different CUDA architectures.

52:52 And, you know, it floats very, very quickly.

52:54 And actually the PyTorch team has to try incredibly hard to stay under one gigabyte.

52:59 If we have variants, we can just slim it down to one CUDA architecture per wheel, you know, so you can go down to like, you know, 200 megabytes or so, 250 maybe.

53:09 But it's way better for, you know, both for index servers, it's better for users.

53:14 It's going to be pretty slow too.

53:16 The only thing it's not better for it's CI servers that have to build all these different things if you start sharding.

53:23 But that's a one-time cost that at the end ends up being.

53:26 It's much better to have a slight increase one time and massive decrease scalable, essentially.

53:33 You build it once, it gets installed a million times.

53:36 That's a massive difference.

53:38 And, you know, it's also better for the warehouse folks like PyPI.

53:43 And it's easy for people to just assume pip install, uv pip install, that sort of stuff is going to work.

53:49 But the cost of just the bandwidth in that infrastructure is astronomical, which is crazy.

53:55 So this is going to be a major benefit for bandwidth.

53:59 Yeah, and like also like install speed.

54:01 You'll also benefit from that because you're no longer downloading as much stuff to actually install PyTorch.

54:07 I mean, if you use uv, it's got some really good caching and it's pretty quick.

54:12 Oh, but it doesn't multiply your bandwidth by magic.

54:16 I wish.

54:18 Charlie, if you find a solution to that.

54:20 I haven't yet.

54:22 But yeah, if you're downloading Torch and all the NVIDIA, all the CUDA stuff, it's, yeah.

54:27 It's hefty.

54:27 It's a large number of megabytes.

54:31 Let's talk real quick about the PyPackaging Native Guide.

54:34 And then I want to get an update on pyx real quick before we go.

54:37 So, Ralph, this is your project, right?

54:39 Tell us about this.

54:39 I'll be sure to show it.

54:41 Okay, so I've been watching discussions about some of the topics we've talked about in this episode, you know, since 2010 or so in Python packaging.

54:51 And even back then, long before we had wheels, you know, NumPy, for example, had different .exe installers that we would upload to PyPI.

54:59 And, like, there would be one named underscore sse2, one underscore sse3.

55:04 And, like, user had just the right .exe and install it on their Windows machine.

55:09 What?

55:09 Wow.

55:10 Oh, I have no idea.

55:11 Okay.

55:11 Yes, it was not fun.

55:14 And actually, this was by far the hardest thing when I became NumPy release manager because we had to build these things on Linux under Wine.

55:21 And there were no instructions and there were really janky scripts.

55:24 So, it took me three months to get the first release out.

55:27 But, yeah, so we, I saw all these discussions about, you know, this was sse2 and sse3.

55:34 And, like, you know, the pip authors and, you know, most of the people who work with pure Python, like, you know, the DevOps folks, the, you know, web framework folks, they had no idea about this.

55:44 And usually, these conversations went in circles because when you explain something to one person, the next person would come in and, like, you know, this is endless mailing list threats.

55:52 That would never go anywhere.

55:53 So, after, you know, seeing that for 12, 13 years or so, I, you know, finally got tired of that.

55:59 And I thought, I'm going to write a reference site that explains the problem.

56:03 I don't want to propose any solutions, but just explain the problem.

56:06 So, the next time someone starts a new conversation about, you know, SIMD extensions or about GPUs or, you know, about some of the issues with mixing, you know, source and binary distributions.

56:18 Just link to this site.

56:19 Like, please use that as our best, you know, approach at trying to summarize a problem, you know.

56:24 So, we have a baseline to start talking about solutions.

56:27 And I think, you know, Jonathan, you know, is one of the people who saw this.

56:30 I think a lot of people read this.

56:32 But it was a nice basis to, you know, just point at this as, like, there are your problem descriptions.

56:38 And, you know, for the GPU part, like, NVIDIA folks really helped to make sure that all the explanations of the problems were correct.

56:46 So, when we started Wheel Next, we could just start talking about, like, okay, what are the solutions here?

56:51 This website is absolutely incredible.

56:54 It's amazing.

56:55 Yeah, it's amazing.

56:56 Thanks to the work that Ralph and every contributor to this website have made.

57:01 This is by far the best explanation anywhere on the internet to all these packaging issues.

57:08 And I really like the perspective that Ralph has took, which is don't state the solution.

57:12 Just focus on stating the problem very clear.

57:14 And then with Wheel Next, we try to take the exact flip coin, flip side of the coin, which is don't focus on the problem.

57:21 It's already explained.

57:23 Just focus on proposing one solution to some of the problems.

57:26 And this is how we created Wheel Next.

57:28 I love it.

57:29 You know, one of the big problems, challenges, I guess, is if you don't fully understand the problem space, you could be debating two different things.

57:37 And one person sees a really important angle, the other person doesn't even see that angle.

57:41 They have a different perspective that they're arguing for optimizing for.

57:45 And so, yeah, it's sort of a little bit like the Wheel Next stuff.

57:48 Like, let's get everyone involved and see all the angles and then discuss it, right?

57:52 Exactly.

57:52 Well, you know the saying, a problem well stated is a problem half solved.

57:57 So, this is exactly what we are trying to say.

58:02 I love it.

58:02 All right, I want to get a quick update on pyx since I feel like, Charlie, you're right in the middle of this.

58:10 I know pyx was looking to solve some of these problems as well.

58:14 Give us the elevator pitch and just, we have a whole episode on this from, I don't know, six months ago or something.

58:19 But, yeah, give us the, what's the situation here and does this change things on how you're handling it and make things easier?

58:24 Yeah, yeah, yeah, for sure.

58:26 So, like, yeah, pyx is our hosted package registry and it's in beta right now.

58:32 So, we're live with a bunch of great customers.

58:37 The goal of pyx is basically to enable us to solve, like, more of the packaging problems that we see in the uv issue tracker by having our own registry that we think is well implemented and solves problems that we see that other registries don't really solve.

58:51 So, like, basically from the start, the way that we've approached the wheel, these, like, problems around the GPU stuff is from, like, two perspectives.

59:00 And in pyx, we're really just focused, in terms of how it overlaps with wheel variants, we're really just focused on the GPU part.

59:07 But the way that we've approached it has basically been try to push the standards forward as much as we can.

59:14 And that's what we've been doing in this effort.

59:15 And then simultaneously try to figure out how we can help users, like, until the standards change.

59:21 And so, pyx has mostly been, has more been in that second camp of, like, assuming the standards don't change because we don't want to, we don't want to, like, unilaterally start changing a bunch of things, like, without going through the process.

59:32 How can we make the world, like, a little bit easier for people who are working with this kind of stuff?

59:36 So, for example, like, in pyx, we take a lot of packages that are, like, PyTorch extensions or need to be built against CUDA, and we, like, build those.

59:46 Like, we build them across a wide range of, like, CUDA versions, PyTorch versions, Python versions, CPU architectures, and we make those available to users.

59:53 So, it doesn't solve the core problem of, like, how do you build and distribute this stuff?

59:58 But it does mean that, like, if you're operating within the constraints of, like, the current set of standards, we can make people's lives easier by making it so they don't have to build so many things.

01:00:06 Like, we build them well, they all work together, all that kind of stuff.

01:00:10 So, that's what, like, we've been focused on.

01:00:12 And I think, like, looking forward, like, our goal is to support WheelNext, like, as soon as, like, sorry, Wheel Variants, like, as soon as possible, and, like, put those into the registry.

01:00:21 So, as soon as we feel like that's a, you know, a feasible thing to do on the registry, we'll support it in pyx and support it for, like, our users and our customers.

01:00:29 But in the meantime, it's kind of been, like, a parallel track effort of pushing forward on all the WheelNext work and standards, and then just trying to, like, solve immediate user problems without changing standards, like, partly through the registry.

01:00:40 Things are going good at pyx?

01:00:41 You're making progress?

01:00:42 Yeah.

01:00:42 Getting closer to public launch?

01:00:43 Yeah, we're making progress.

01:00:44 Yeah, yeah.

01:00:45 No, it's good.

01:00:46 Customers are growing.

01:00:47 Numbers are growing up.

01:00:48 It's good.

01:00:48 Awesome.

01:00:49 People want to try pyx?

01:00:50 What are they?

01:00:51 Are they?

01:00:51 They can join the waitlist here.

01:00:53 Yeah, yeah.

01:00:53 This is, you know, you just, we have a, yeah, or you can go to astral.sh.pyx, and we look at all the responses, and we basically onboard people one by one.

01:01:01 So, talking about when is this stuff going to be ready, you'll be able to adopt it, I guess maybe that's a good place to close out our conversation here is, what's the timeline?

01:01:10 What are expectations?

01:01:11 How are things going?

01:01:12 What's next?

01:01:13 It's a great question.

01:01:14 Everything's open source.

01:01:15 It's a two-month delay.

01:01:20 No.

01:01:20 What's the party line on this question?

01:01:25 Oh, gosh.

01:01:27 Well, it's, I liked, we have a joke inside, I don't know if it's inside widespread inside WeOnX, but we call this the Barry's fourth law, Varso fourth law, I don't remember exactly how, which is essentially make an estimate, multiply it by two, and change the unit.

01:01:44 So, if you think it's going to take six months, it's one year.

01:01:47 Oh, no.

01:01:48 Change the unit, one decade.

01:01:50 And it's a running joke that we have that I think is really good.

01:02:00 Realistically, I think it depends on where are we going to set the bar for starting to roll things out.

01:02:09 So, as Ralf was saying, we'll probably see some provisionally accepted.

01:02:13 But as we get to that point, some of the stuff will be possible.

01:02:18 For example, I expect that little by little, we can start experimenting with things without getting necessarily to the absolute final stage.

01:02:28 But the full feature will be available to the app at the last stage.

01:02:32 So, complicated question to answer.

01:02:35 We hope that it's not going to take too many years.

01:02:38 I'll make a connection back to pyx here.

01:02:40 Because I think, you know, there's part is like, okay, there's four PAPs that need to be reviewed.

01:02:45 Probably we need to update some prototypes here and there.

01:02:48 It's probably going to take, you know, the better part of this year.

01:02:51 At that point, you know, you have accepted PAPs, right?

01:02:53 But then PyPI needs to be updated.

01:02:55 Like, you know, all the tools that, like Twine, would need to be updated.

01:02:59 Like, there's a new metadata version.

01:03:01 So, everything that consumes that needs to be updated before, you know, package authors can actually start producing these wheels and upload them to PyPI.

01:03:10 So, that's going to not be this year, right?

01:03:12 There's a very long tail of, you know, how the implementation rolls through the ecosystem.

01:03:17 And then you have to wait until users get newer tools.

01:03:20 And then, only then can you start uploading wheels.

01:03:22 So, I'm going to poke at Charlie a bit here.

01:03:25 Because one of the advantages of having a separate registry is, you know, plus the ability to rebuild everything.

01:03:31 You can start using variant wheels, like, the moment that everything is accepted.

01:03:36 It's way sooner.

01:03:36 I know.

01:03:36 It's way sooner.

01:03:36 Yes.

01:03:37 Have you thought about that?

01:03:38 That is true.

01:03:39 Yeah, yeah, of course.

01:03:40 Yeah.

01:03:40 I think from our perspective, we're mostly like, do we feel like the design is done or how much turn will there be on the design?

01:03:47 But yeah, we're definitely in a position to, like, start building and distributing this stuff much, much sooner.

01:03:51 uv has a second advantage, which is I think they have a much shorter tail of users in terms of version.

01:03:57 I think uv users end up on a much more quote-unquote recent version.

01:04:02 If you look at pip, I think, I don't remember the statistic on top of my head, but a still significant portion of users use five-year-old version of pip, which I don't even know which version of Python.

01:04:12 It was 3.9 or something.

01:04:15 So it is, uv is able to move a lot faster, but also the users are more reactive.

01:04:21 That's a very interesting point.

01:04:23 A very interesting angle.

01:04:24 I mean, I think a lot of people who are very tuned into the Python space have switched to uv, started using uv.

01:04:30 And there's probably a lot of people who don't read the newsletters, don't listen to the podcast, and so on.

01:04:35 And they know pip, and they just keep on PIPing, which is fine.

01:04:38 I'm not knocking it.

01:04:39 But, you know, it means not only are they using pip, they might be using an older version of Python because they don't want to shake it up.

01:04:47 And, you know, those are going to be the long tails that are going to be hard.

01:04:50 I guess one more thought about what's next here before we call this a show here.

01:04:55 What is the minimal?

01:04:56 We talked about PEP 825, the minimal PEP.

01:04:59 What is the minimal amount of adoption, right?

01:05:01 So if the top five biggest data science and machine learning libraries adopt this and the installer tools like uv and pip support it, that actually alone might be a really big benefit if all the other packages are just ignored, right?

01:05:15 So that's way more achievable than every single package that has native code has all these specifiers, right?

01:05:22 What's the minimum level of adoption?

01:05:24 I'd say that, I mean, the minimum level at which you can call it a success, yeah, five is probably not that far off.

01:05:30 The benefits start to accumulate quickly.

01:05:32 But I would expect once packages like PyTorch start adopting this, especially in the deep learning space, you know, this will be adopted very widely, very quickly because it solves so many problems.

01:05:43 Like many of the most popular packages like VLLM with very large development teams and very large numbers of users.

01:05:50 If you look at their install pages, it's like, you know, it's like a puzzle book.

01:05:54 You just don't know how to install this stuff and they don't have wheels on PyPI and they have their own extra index servers.

01:06:00 And it's not for lack of trying.

01:06:02 It's not for lack of trying.

01:06:03 Like those teams like put a lot of effort into trying to make it easier to install, but they basically all run into different kinds of roadblocks.

01:06:09 I think five packages is what you'll get after maybe two weeks.

01:06:14 After a month, you will get twice that amount and probably a quadratic progression for quite a few weeks.

01:06:21 But it's especially in the scientific compute space and maybe machine learning to be more specific.

01:06:27 Well, the moment that it works, so many packages will switch.

01:06:31 Like so many.

01:06:32 If you just take PyTorch, half of its dependencies will probably activate variant mode.

01:06:36 And then the people that build on top of PyTorch are people who build on top of Jax.

01:06:41 So just that, you end up with at least 50 packages in a matter of a few months.

01:06:47 Yeah, I'm just thinking there's probably a very small set that are feeling the most pain.

01:06:52 You could do direct outreach to just the most important projects and get that adopted and make a really big difference, even if it's not every package.

01:07:00 But the funny part is that most of the packages that would be interested, that we would reach out to are already part of WheelNext.

01:07:08 Because they in some way find the pain pretty significant and are starving for a solution.

01:07:16 They know.

01:07:17 They already know.

01:07:18 All right, let's call it a show, folks.

01:07:20 Let's final call to action.

01:07:21 People out there listening, either they're maintainers of packages or they're users of these libraries or they got their own open source project.

01:07:30 They're seeing the light.

01:07:31 They want to get involved.

01:07:32 They want to try it out.

01:07:33 What do you tell them?

01:07:34 Well, first, it would be great if people were to come and discuss.python.org.

01:07:39 That's where the community is trying to aggregate to discuss all these different proposals.

01:07:45 So I think the more people get involved, the more better.

01:07:51 But also trying the different packages that we are trying to publish that Charlie's has been helping us and his team has been helping us to create a sort of end-to-end experience.

01:08:02 I think right now we have example of Linux, macOS, and Windows.

01:08:08 It works on different type of hardware, different type of CPUs, different type of GPUs.

01:08:13 It works pretty broadly, and we wanted to give a sort of sample flavor of what could be a variant-enabled world.

01:08:22 Yeah, I'd say, yeah, for the majority of listeners, they're not going to be packaging tool authors, right?

01:08:26 So those are the ones you would expect to participate in the review primarily.

01:08:31 But I'd say if you're a user of any of the packages we mentioned, just try it out.

01:08:35 You know, download the uv variant-enabled installer.

01:08:39 And if you're a package author and we haven't mentioned your package, but it will solve a problem for you, get in touch.

01:08:45 Because I think that's maybe the most relevant part here.

01:08:49 There's at least hundreds, maybe thousands of packages that we think we have answers for.

01:08:55 But if their solution or their problem statement is slightly different, I think now would be a great time to learn and make sure we cover as many use cases as possible.

01:09:02 Yeah, I mean, I guess the only thing I'd say is ideally the average user won't even have to think about this, right?

01:09:08 And hopefully they just get it through uv or through pip or whatever in the long term.

01:09:13 But that may take time.

01:09:15 But that's our goal, certainly.

01:09:16 Yeah, it's all behind the scenes.

01:09:18 They don't know.

01:09:18 But certainly, if it solves a problem, reach out, be part of it.

01:09:23 Jonathan, Ralph, Charlie, thanks for being on the show.

01:09:25 It's been great.

01:09:26 Keep up being around.

01:09:26 Thanks for having us.

01:09:28 Bye.

01:09:29 Bye-bye.

01:09:29 Bye.

01:09:30 This has been another episode of Talk Python To Me.

01:09:32 Thank you to our sponsors.

01:09:34 Be sure to check out what they're offering.

01:09:35 It really helps support the show.

01:09:37 This episode is brought to you by Sentry.

01:09:39 You know Sentry for the error monitoring, but they now have logs too.

01:09:43 And with Sentry, your logs become way more usable, interleaving into your error reports to enhance debugging and understanding.

01:09:51 Get started today at talkpython.fm/sentry.

01:09:54 And it's brought to you by Temporal, durable workflows for Python.

01:09:58 Write your workflows as normal Python code and Temporal ensures they run reliably, even across crashes and restarts.

01:10:05 Get started at talkpython.fm/Temporal.

01:10:09 If you or your team needs to learn Python, we have over 270 hours of beginner and advanced courses on topics ranging from complete beginners to async code, Flask, Django, HTML, and even LLMs.

01:10:21 Best of all, there's no subscription in sight.

01:10:24 Browse the catalog at talkpython.fm.

01:10:27 And if you're not already subscribed to the show on your favorite podcast player, what are you waiting for?

01:10:32 Just search for Python in your podcast player.

01:10:34 We should be right at the top.

01:10:35 If you enjoyed that geeky rap song, you can download the full track.

01:10:38 The link is actually in your podcast blur of show notes.

01:10:41 This is your host, Michael Kennedy.

01:10:43 Thank you so much for listening.

01:10:44 I really appreciate it.

01:10:45 I'll see you next time.

01:10:46 Bye.

01:11:16 Bye.