Ahoy, Narwhals are bridging the data science APIs
Episode Deep Dive
Introduction and Guest Background
- Guest: Marco Gorelli, a Python developer at Quansight Labs with deep involvement in the data science ecosystem, previously a Pandas maintainer.
- Episode Topic: Exploring Narwhals, a Python library that acts as a compatibility layer between multiple data frame libraries (Pandas, Polars, cuDF, Modin, Dask, etc.).
What Is Narwhals?
Narwhals: A pure-Python compatibility layer to write data-frame-agnostic code.
- It allows library authors (or tool builders) to support multiple data frame backends without having each user install every library.
- Inspired by the Polars API (especially Polars’ expression system).
Intended Audience: Primarily library and tool maintainers, though teams with mixed data frame usage (e.g., some Pandas, some Polars) can benefit as well.
Project Links:
- GitHub: github.com/narwhals
- Docs: narwhals.dev (uses mkdocs)
The Data Frame Landscape
Pandas: Most popular; large API; historically built around NumPy.
Polars: A rising star; written in Rust; offers lazy execution and a more rigid, expression-based API.
Others:
- cuDF (NVIDIA, GPU-accelerated Pandas-like),
- Modin (distributed DataFrame engine),
- Dask (distributed computing, partial DataFrame support),
- IBIS (database-like abstractions, but also can act as a DataFrame library),
- etc.
Why Narwhals?
- Multiple Data Frames, One API
- Solve the challenge for libraries that accept or produce data frames but don’t want to lock in to a single library (Pandas or Polars).
- Performance
- Low overhead or near-zero overhead: no large data copies or conversions needed.
- Lazy execution (Polars or Dask) can remain lazy under Narwhals, while Pandas or cuDF remain eager.
- Minimal Dependencies
- Narwhals itself is just Python files with no binary dependencies.
- End users only need whichever backend data frame library they actually want to use.
How It Works
- Expressions Model
- Narwhals focuses on Polars-like expressions (
pl.col("colname")
) that represent transformations from a DataFrame to a Series. - Under the hood, it dispatches to the backend’s respective methods (for Pandas, Polars, cuDF, etc.).
- Narwhals focuses on Polars-like expressions (
- Decorator Approach
@narwhalify
can automatically convert incoming DataFrames into the Narwhals representation, run the transformation, and convert back to the original type.- Allows libraries to remain data-frame-agnostic while still returning native objects (e.g., pass in Pandas, get Pandas out).
- Stable API
- Users can import from
narwhals.stable.v1
to ensure perfect backward compatibility going forward, much like “editions” in Rust.
- Users can import from
Supported Libraries and Levels of Support
- Full Support
- Pandas, Polars (eager and lazy), cuDF, Modin.
- Partial / Interchange Support
- Some partial support for Dask.
- IBIS interchange compatibility: enough to let other tools (like Altair) query schema information, etc.
- Altair and Others
- Altair recently adopted Narwhals, giving it a big spike in downloads and visibility.
- Scikit-Lego also implemented Narwhals for data-frame-agnostic transformations.
Typing and Tool Builder Considerations
- Type Hints
- Narwhals uses Python’s modern type hinting heavily, with protocols and generics.
- Improves dev experience, providing IDE autocompletion and inline documentation.
- Example Usage
- Explicit Conversion:
nw.fromNative(...)
/.toNative()
. - Decorator:
@narwhalify
to automatically convert in/out of Narwhals.
- Explicit Conversion:
Roadmap and Future Plans
- Continue helping libraries that wish to adopt Narwhals (e.g., Formulaic, Shiny, Plotly).
- Potentially broaden coverage of partial support for:
- Dask (further lazy DataFrame operations),
- DuckDB (though it returns tables and SQL by default, there’s interest in bridging it with Polars-like expressions).
- Expand tutorials, examples, and docs. Open to contributors.
Overall Takeaway
Narwhals provides a universal wrapper around different data frame libraries, letting tool creators write one set of logic and still support diverse user bases who work with Pandas, Polars, or others. It stays lightweight, purely in Python, and focuses on bridging the differences without imposing overhead or restricting the unique features of each backend.
Links from the show
Marco on LinkedIn: linkedin.com
Narwhals: github.io
Narwhals on Github: github.com
DuckDB: duckdb.org
Ibis: ibis-project.org
modin: readthedocs.io
Pandas and Beyond with Wes McKinney: talkpython.fm
Polars: A Lightning-fast DataFrame for Python: talkpython.fm
Polars: pola.rs
Pandas: pandas.pydata.org
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm
--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy