Learn Python with Talk Python's 270 hours of courses

25 Pandas Functions You Didn’t Know Existed

Episode #341, published Wed, Nov 17, 2021, recorded Thu, Nov 4, 2021

Do you do anything with Jupyter notebooks? If you do, there is a very good chance you're working with the pandas library. This is one of THE primary tools of anyone doing computational work or data exploration with Python. Yet, this library is massive and knowing the idiomatic way to use it can be hard to discover.

That's why I've invited Bex Tuychiev to be our guest. He wrote an excellent article highlighting 25 idiomatic Pandas functions and properties we should all keep in our data toolkit. I'm sure there is something here for all of us to take away and use pandas that much better.

Watch this episode on YouTube
Play on YouTube
Watch the live stream version


The 25 functions


  1. ExcelWriter is a generic class for creating excel files (with sheets!) and writing DataFrames to them.
  2. pipe is one of the best functions for doing data cleaning in a concise, compact manner in Pandas
  3. factorize: This function is a pandas alternative to Sklearn’s LabelEncoder
  4. A function with an interesting name is explode.
  5. Another function with a funky name is squeeze and is used in very rare but annoying edge cases.
  6. between: A rather nifty function for boolean indexing numeric features within a range.
  7. All DataFrames have a simple T attribute, which stands for transpose.
  8. Did you know that Pandas allows you to style DataFrames?
  9. Pandas options
  10. convert_dtypes: We all know that pandas has an annoying tendency to mark some columns as object data type. Instead of manually specifying their types, you can use convert_dtypes method which tries to infer the best data type.
  11. A function I use all the time is select_dtypes.
  12. mask allows you to quickly replace cell values where a custom condition is true.
  13. min and max along the columns axis
  14. nlargest and nsmallest.
  15. However, sometimes you want the position of the min/max, you should use idxmax/idxmin
  16. value_counts with dropna=False: common operation to find the percentage of missing values is to chain isnull and sum and divide by the length of the array - you can do the same thing with value_counts with relevant arguments
  17. clip function makes it really easy to find outliers outside a range and replace them with the hard limits.
  18. at_time allows you to subset values at a specific date or time.
  19. bdate_range is a short-hand function to create TimeSeries indices with business-day frequency
  20. autocorr
  21. Pandas offers a quick method to check if a given series contains any nulls with hasnans attribute
  22. at and iat: These two accessors are much faster alternatives to loc and iloc with a disadvantage. They only allow selecting or replacing a single value at a time
  23. argsort: You should use this function when you want to extract the indices that would sort an array
  24. When a column is a category, you can use several special functions using the cat accessor.
  25. GroupBy.nth: This function only works with GroupBy objects. Specifically, after grouping, nth returns the nth row from each group

Episode Deep Dive

Guests Introduction and Background

Beks (Bex) Toichev is a seasoned Python developer and data science enthusiast. He’s recognized as a Kaggle master, where hethud regularly participates in competitions and shares tutorials. He also writes top-rated articles on Medium, specializing in artificial intelligence and data science topics. During this episode, Beks explains how he came to focus on Python for data science, how he uses writing to solidify his learning, and why pandas is such a critical library for data exploration and analysis.

What to Know If You're New to Python

Here are a few basics to help you follow along with the discussion in this episode and get more out of pandas:

  • Variables and Data Structures: Python lets you store data in lists, dictionaries, and more. Pandas builds upon these to store tabular data.
  • Avoiding Loops in Pandas: A major theme is that you often replace Python for-loops with vectorized operations and built-in functions in pandas.
  • Indexing and Slicing: Pandas data frames can be sliced similarly to Python lists, but with more powerful indexing options like .loc and .iloc.
  • pip / venv: Installing pandas or other libraries is done via tools like pip install pandas. Virtual environments (venv or others) help you manage project dependencies without conflicts.

Key Points and Takeaways

  1. 25 Lesser-Known Pandas Functions Beks shares a collection of “hidden gem” pandas functions and features that can drastically improve productivity. Many of these allow you to work more efficiently without falling back to manual loops or if-statements. Knowing they exist is often more important than learning the details by heart. You can immediately apply them in data cleaning, manipulation, or advanced exploration tasks.
  2. Fluent Data Processing with pipe One standout function is DataFrame.pipe(), which helps you chain operations together in a very readable way. Instead of writing nested function calls or creating intermediate data frames, you can design a pipeline that processes data step by step. This fluent style mirrors scikit-learn pipelines and can make your notebooks more maintainable.
  3. Converting Categorical Data with factorize factorize quickly transforms text-based categories (like "sun" or "rain") into numeric labels for machine learning. This is similar to label encoding but built right into pandas. It’s especially handy if you want to skip external libraries for simple encoding tasks.
  4. Handling Nested or Multi-Value Cells via explode Survey results often have rows where one cell contains multiple values (lists). explode automatically splits those lists into separate rows, preserving other columns. It’s a perfect example of removing loops and manual code for an operation that can be done in one line.
  5. Highlighting Key Insights with Pandas Styling The DataFrame.style attribute lets you add color scales, highlights, and conditional formatting directly in Jupyter notebooks. This is useful for quickly spotting trends or outliers. Background gradients, highlighting min/max values, and custom CSS can all provide immediate, visual feedback about your data.
  6. Checking for Missing Values Pandas offers many ways to handle NaN and missing data. A particularly quick check is .hasnans on a Series, letting you decide if you need to drop or impute missing values. Missing data is a central challenge for data scientists, and acknowledging it early can save a lot of time.
  7. Time Series Filtering with at_time and between_time If you have a DateTime index, these methods let you filter rows occurring at a specific time or across a time range (e.g., "business hours") in just one line. It’s essential for tasks like slicing out morning data or ignoring weekend time stamps in stock trading data.
  8. Business Date Ranges (bdate_range) For time series work, bdate_range excludes weekends and holidays if desired, giving you a date index pre-filtered for business days. This is crucial when analyzing stock data or any schedule-bound events.
  9. The Speed of Pandas vs. Loops A recurring lesson is that using built-in pandas methods is much faster than writing loops in plain Python. Pandas uses efficient, C-backed code under the hood, so shifting your mindset to vectorized operations can drastically speed up your workflow.
    • Links and Tools:
      • NumPy – Underlies much of pandas’ performance benefits.
  10. Bex’s Tips for Mastering Libraries Beks repeatedly emphasized reading official documentation to discover lesser-known gems. He also points out that contributing to or engaging with communities (like Kaggle or Medium articles) accelerates learning. You need to do more than just skim; deep exploration of the docs, plus real-world application, cements these skills.

Interesting Quotes and Stories

  • “You just have to be one step ahead of your audience, and that’s it.” – Beks on writing articles or sharing knowledge even if you’re not an absolute expert.
  • “I used to get annoyed seeing these complex functions, so I wrote the article to learn them, and share with the audience.” – Beks explaining his motivation for discovering 25 lesser-known pandas features.

Key Definitions and Terms

  • Vectorized Operations: Performing array-wide or column-wide operations in one step rather than looping through each element.
  • Categorical Encoding: Turning text labels (categories) into numeric values for machine learning.
  • Time-Series Analysis: Working with time-indexed data, often focusing on specialized indexing and filtering.
  • Missing Values / NaN: Indicators in data that information is not available or not applicable, requiring cleaning or imputation.

Learning Resources

If you want to deepen your Python and data science skills, here are some courses from Talk Python Training.

Overall Takeaway

This episode serves as a reminder that pandas contains many tools beyond the basics everyone knows. Embracing these lesser-known functions will improve efficiency, clarity, and performance in your data science workflows. By continuously exploring the documentation, writing about what you learn, and engaging with open communities (like Kaggle), you’ll keep discovering new ways to take advantage of Python’s rich data ecosystem. Above all, remember that sometimes “knowing about a feature” is the biggest leap toward more powerful and elegant solutions.

Links from the show

Bex Tuychiev: linkedin.com
Bex's Medium profile: ibexorigin.medium.com

Numpy 25 functions article: towardsdatascience.com
missingno package: coderzcolumn.com
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Talk Python's Mastodon Michael Kennedy's Mastodon