Learn Python with Talk Python's 270 hours of courses

Ahoy, Narwhals are bridging the data science APIs

Episode #480, published Wed, Oct 9, 2024, recorded Tue, Sep 10, 2024

If you work in data science, you definitely know about data frame libraries. Pandas is certainly the most popular, but there are others such as cuDF, Modin, Polars, Dask, and more. They are all similar but definitely not the same APIs and Polars is quite different. But here's the problem. If you want to write a library that is for users of more than one of these data frame frameworks, how do you do that? Or if you want to leave open the possibility of changing yours after the app is built, same problem. That's the problem that Narwhals solves. We have Marco Gorelli on the show to tell us all about it.

Watch this episode on YouTube
Play on YouTube
Watch the live stream version

Episode Deep Dive

Introduction and Guest Background

  • Guest: Marco Gorelli, a Python developer at Quansight Labs with deep involvement in the data science ecosystem, previously a Pandas maintainer.
  • Episode Topic: Exploring Narwhals, a Python library that acts as a compatibility layer between multiple data frame libraries (Pandas, Polars, cuDF, Modin, Dask, etc.).

What Is Narwhals?

  • Narwhals: A pure-Python compatibility layer to write data-frame-agnostic code.

    • It allows library authors (or tool builders) to support multiple data frame backends without having each user install every library.
    • Inspired by the Polars API (especially Polars’ expression system).
  • Intended Audience: Primarily library and tool maintainers, though teams with mixed data frame usage (e.g., some Pandas, some Polars) can benefit as well.

Project Links:


The Data Frame Landscape

  • Pandas: Most popular; large API; historically built around NumPy.

  • Polars: A rising star; written in Rust; offers lazy execution and a more rigid, expression-based API.

  • Others:

    • cuDF (NVIDIA, GPU-accelerated Pandas-like),
    • Modin (distributed DataFrame engine),
    • Dask (distributed computing, partial DataFrame support),
    • IBIS (database-like abstractions, but also can act as a DataFrame library),
    • etc.

Why Narwhals?

  1. Multiple Data Frames, One API
    • Solve the challenge for libraries that accept or produce data frames but don’t want to lock in to a single library (Pandas or Polars).
  2. Performance
    • Low overhead or near-zero overhead: no large data copies or conversions needed.
    • Lazy execution (Polars or Dask) can remain lazy under Narwhals, while Pandas or cuDF remain eager.
  3. Minimal Dependencies
    • Narwhals itself is just Python files with no binary dependencies.
    • End users only need whichever backend data frame library they actually want to use.

How It Works

  • Expressions Model
    • Narwhals focuses on Polars-like expressions (pl.col("colname")) that represent transformations from a DataFrame to a Series.
    • Under the hood, it dispatches to the backend’s respective methods (for Pandas, Polars, cuDF, etc.).
  • Decorator Approach
    • @narwhalify can automatically convert incoming DataFrames into the Narwhals representation, run the transformation, and convert back to the original type.
    • Allows libraries to remain data-frame-agnostic while still returning native objects (e.g., pass in Pandas, get Pandas out).
  • Stable API
    • Users can import from narwhals.stable.v1 to ensure perfect backward compatibility going forward, much like “editions” in Rust.

Supported Libraries and Levels of Support

  • Full Support
    • Pandas, Polars (eager and lazy), cuDF, Modin.
  • Partial / Interchange Support
    • Some partial support for Dask.
    • IBIS interchange compatibility: enough to let other tools (like Altair) query schema information, etc.
  • Altair and Others
    • Altair recently adopted Narwhals, giving it a big spike in downloads and visibility.
    • Scikit-Lego also implemented Narwhals for data-frame-agnostic transformations.

Typing and Tool Builder Considerations

  • Type Hints
    • Narwhals uses Python’s modern type hinting heavily, with protocols and generics.
    • Improves dev experience, providing IDE autocompletion and inline documentation.
  • Example Usage
    1. Explicit Conversion: nw.fromNative(...) / .toNative().
    2. Decorator: @narwhalify to automatically convert in/out of Narwhals.

Roadmap and Future Plans

  • Continue helping libraries that wish to adopt Narwhals (e.g., Formulaic, Shiny, Plotly).
  • Potentially broaden coverage of partial support for:
    • Dask (further lazy DataFrame operations),
    • DuckDB (though it returns tables and SQL by default, there’s interest in bridging it with Polars-like expressions).
  • Expand tutorials, examples, and docs. Open to contributors.

Overall Takeaway

Narwhals provides a universal wrapper around different data frame libraries, letting tool creators write one set of logic and still support diverse user bases who work with Pandas, Polars, or others. It stays lightweight, purely in Python, and focuses on bridging the differences without imposing overhead or restricting the unique features of each backend.

Links from the show

Marco Gorelli: @marcogorelli
Marco on LinkedIn: linkedin.com
Narwhals: github.io
Narwhals on Github: github.com

DuckDB: duckdb.org
Ibis: ibis-project.org
modin: readthedocs.io
Pandas and Beyond with Wes McKinney: talkpython.fm
Polars: A Lightning-fast DataFrame for Python: talkpython.fm
Polars: pola.rs
Pandas: pandas.pydata.org
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Talk Python's Mastodon Michael Kennedy's Mastodon