Learn Python with Talk Python's 270 hours of courses

Polars: A Lightning-fast DataFrame for Python [updated audio]

Episode #402, published Wed, Feb 8, 2023, recorded Sun, Jan 29, 2023

When you think about processing tabular data in Python, what library comes to mind? Pandas, I'd guess. But there are other libraries out there and Polars is one of the more exciting new ones. It's built in Rust, embraces parallelism, and can be 10-20x faster than Pandas out of the box.

We have Polars' creator, Ritchie Vink here to give us a look at this exciting new data frame library.

Watch this episode on YouTube
Play on YouTube
Watch the live stream version

Episode Deep Dive

Guests introduction and background

Ritchie Vink is a seasoned Python developer, data engineer, and the creator of Polars, a lightning-fast DataFrame library for Python (and other languages) built in Rust. He started his career as a civil engineer but soon moved into data science and data engineering, motivated by automating tedious tasks with Python. Eventually, he discovered Rust and fell in love with its performance and safety guarantees, leading him to build Polars from the ground up for high-performance tabular data processing. Now, Ritchie focuses on Polars full-time, aiming to build a modern, parallel, and more “database-aware” DataFrame library for Python programmers everywhere.

What to Know If You're New to Python

Here are a few essentials to help you get the most out of the conversation about Polars, Rust, and performance:

  • Understand the basics of arrays, lists, and DataFrames. Polars handles them differently from libraries like pandas but the concepts will feel somewhat familiar.
  • Familiarize yourself with file formats like CSV and Parquet. The episode often contrasts the speed trade-offs between various formats.
  • Having some experience with Python’s concurrency limits (the GIL) will help you understand the conversation around Rust’s multithreading and Polars’ performance gains.
  • Knowing the high-level idea of how SQL queries get optimized (filtering, projecting, and pushing down operations) will make it easier to follow the discussion about lazy frames and query optimization in Polars.

Key points and takeaways

  1. Polars as a Rust-based DataFrame Library Polars is built from the ground up in Rust to address many of the challenges pandas faces with performance and memory usage. Rust's strict ownership and concurrency rules let Polars bypass Python’s global interpreter lock (GIL) and make multi-threading safe and efficient.
  2. Performance vs. Pandas Polars can be 10-20x faster than pandas on common operations such as filtering, grouping, and joining large datasets. It also outperforms tools like Dask in many single-machine scenarios because it can control the entire memory and computation strategy rather than layering parallelization on top of another library.
  3. Lazy Frames and Query Optimization A key advantage in Polars is its lazy API. Instead of executing each operation immediately (like pandas), Polars builds a “query plan” and optimizes it before execution. This approach can push down filters to the data loading step, skip unused columns, and avoid unnecessary intermediate results—dramatically boosting speed.
  4. Memory Usage and Arrow Format Polars adopts the Apache Arrow memory model, which stores data in a standardized columnar layout. This avoids constant data copying and allows faster reads, writes, and operations on large datasets. Using Arrow also means Polars can interoperate smoothly with other libraries that speak Arrow.
  5. No Multi-Index, Strict Schemas Unlike pandas, Polars does not support a multi-index. Instead, it focuses on a simpler, more explicit indexing system (often just 0-based). In Polars, data types and schemas are consistent and checked up front—helping users catch mistakes early and avoid ambiguous operations.
  6. Connectors and Data Ingestion Polars supports multiple file formats (CSV, JSON, Parquet, Arrow IPC) and integrates with databases via ConnectorX. The library can scan data lazily so that large files don’t have to be fully pulled into memory if only a subset of the data is needed.
  7. Parallelism and Out-of-Core Processing Thanks to Rust’s concurrency model, Polars exploits parallel processing while respecting memory limits. It can handle out-of-core computations, so you can work with datasets larger than your machine’s RAM by streaming data in chunks.
  8. Ritchie’s Journey from Civil Engineer to Data Scientist The creator of Polars, Ritchie Vink, started as a civil engineer automating repetitive tasks with Python. His explorations in data science and frustration with pandas’ bottlenecks led him to Rust and, ultimately, to building Polars as a hobby project that grew into a thriving open-source library.
  9. Integration with Python’s Ecosystem Even if you rely heavily on other parts of the Python data stack, Polars fits in. For example, you can convert your Polars DataFrame to a pandas DataFrame for downstream tasks like visualization. Many popular data libraries can read Arrow data, so Polars works well in end-to-end pipelines.
  10. Active Community and Future Plans Polars is evolving with new releases and community contributions. Projects like ADBC (Apache Arrow Database Connector) promise deeper integration with databases, potentially pushing more computations down to the data source in the future. Ritchie is planning a Polars foundation, seeking sponsorships and more formal backing.

Interesting quotes and stories

  • “The best work is work you don’t have to do.” Summarizing the Polars approach to skipping irrelevant data, Ritchie pointed out that by pushing filters down to the data source, Polars doesn’t bother loading or processing rows that fail a filter.
  • “Once I got used to Rust, it was a renaissance of coding for me.” Ritchie described his excitement about Rust’s memory safety and concurrency, which made writing high-performance systems code feel fun and less error-prone.

Key definitions and terms

  • Lazy Evaluation: Deferring the execution of operations until a final “collect” step, allowing for query planning and optimizations.
  • Columnar Format: A way to store table data by columns, rather than rows, which benefits analytic workloads and modern CPU caches.
  • Apache Arrow: A language-agnostic columnar memory format that allows zero-copy data interchange among multiple systems.
  • ConnectorX: A fast data loading tool that helps Polars pull data from various databases without heavy overhead.

Learning resources

If you want to sharpen your data wrangling and Python skills, here are some relevant training options from Talk Python Training:

Overall takeaway

Polars offers a fresh, Rust-powered approach to DataFrame workflows in Python. By embracing lazy evaluation, parallelism, and the Arrow memory format, it brings remarkable speed improvements and more predictable, streamlined data operations. Whether you’re seeking faster performance than pandas on big data or eager to explore modern memory-safe programming concepts in your Python data projects, Polars represents an exciting new frontier in the data science ecosystem.

Links from the show

Ritchie on Mastodon: @ritchie46@fosstodon.org
Ritchie on Twitter: @RitchieVink
Ritchie's website: ritchievink.com

Polars: pola.rs
Apache Arrow: arrow.apache.org
Polars Benchmarks: pola.rs
Coming from Pandas Guide: github.io
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Talk Python's Mastodon Michael Kennedy's Mastodon