25 Pandas Functions You Didn’t Know Existed
That's why I've invited Bex Tuychiev to be our guest. He wrote an excellent article highlighting 25 idiomatic Pandas functions and properties we should all keep in our data toolkit. I'm sure there is something here for all of us to take away and use pandas that much better.
The 25 functions
- ExcelWriter is a generic class for creating excel files (with sheets!) and writing DataFrames to them.
- pipe is one of the best functions for doing data cleaning in a concise, compact manner in Pandas
- factorize: This function is a pandas alternative to Sklearn’s LabelEncoder
- A function with an interesting name is explode.
- Another function with a funky name is squeeze and is used in very rare but annoying edge cases.
- between: A rather nifty function for boolean indexing numeric features within a range.
- All DataFrames have a simple T attribute, which stands for transpose.
- Did you know that Pandas allows you to style DataFrames?
- Pandas options
- convert_dtypes: We all know that pandas has an annoying tendency to mark some columns as object data type. Instead of manually specifying their types, you can use convert_dtypes method which tries to infer the best data type.
- A function I use all the time is select_dtypes.
- mask allows you to quickly replace cell values where a custom condition is true.
- min and max along the columns axis
- nlargest and nsmallest.
- However, sometimes you want the position of the min/max, you should use idxmax/idxmin
- value_counts with dropna=False: common operation to find the percentage of missing values is to chain isnull and sum and divide by the length of the array - you can do the same thing with value_counts with relevant arguments
- clip function makes it really easy to find outliers outside a range and replace them with the hard limits.
- at_time allows you to subset values at a specific date or time.
- bdate_range is a short-hand function to create TimeSeries indices with business-day frequency
- autocorr
- Pandas offers a quick method to check if a given series contains any nulls with hasnans attribute
- at and iat: These two accessors are much faster alternatives to loc and iloc with a disadvantage. They only allow selecting or replacing a single value at a time
- argsort: You should use this function when you want to extract the indices that would sort an array
- When a column is a category, you can use several special functions using the cat accessor.
- GroupBy.nth: This function only works with GroupBy objects. Specifically, after grouping, nth returns the nth row from each group
Episode Deep Dive
Guests Introduction and Background
Beks (Bex) Toichev is a seasoned Python developer and data science enthusiast. He’s recognized as a Kaggle master, where hethud regularly participates in competitions and shares tutorials. He also writes top-rated articles on Medium, specializing in artificial intelligence and data science topics. During this episode, Beks explains how he came to focus on Python for data science, how he uses writing to solidify his learning, and why pandas is such a critical library for data exploration and analysis.
What to Know If You're New to Python
Here are a few basics to help you follow along with the discussion in this episode and get more out of pandas:
- Variables and Data Structures: Python lets you store data in lists, dictionaries, and more. Pandas builds upon these to store tabular data.
- Avoiding Loops in Pandas: A major theme is that you often replace Python for-loops with vectorized operations and built-in functions in pandas.
- Indexing and Slicing: Pandas data frames can be sliced similarly to Python lists, but with more powerful indexing options like
.loc
and.iloc
. - pip / venv: Installing pandas or other libraries is done via tools like
pip install pandas
. Virtual environments (venv
or others) help you manage project dependencies without conflicts.
Key Points and Takeaways
- 25 Lesser-Known Pandas Functions
Beks shares a collection of “hidden gem” pandas functions and features that can drastically improve productivity. Many of these allow you to work more efficiently without falling back to manual loops or if-statements. Knowing they exist is often more important than learning the details by heart. You can immediately apply them in data cleaning, manipulation, or advanced exploration tasks.
- Links and Tools:
- Fluent Data Processing with
pipe
One standout function isDataFrame.pipe()
, which helps you chain operations together in a very readable way. Instead of writing nested function calls or creating intermediate data frames, you can design a pipeline that processes data step by step. This fluent style mirrors scikit-learn pipelines and can make your notebooks more maintainable.- Links and Tools:
- Converting Categorical Data with
factorize
factorize
quickly transforms text-based categories (like "sun" or "rain") into numeric labels for machine learning. This is similar to label encoding but built right into pandas. It’s especially handy if you want to skip external libraries for simple encoding tasks.- Links and Tools:
- Handling Nested or Multi-Value Cells via
explode
Survey results often have rows where one cell contains multiple values (lists).explode
automatically splits those lists into separate rows, preserving other columns. It’s a perfect example of removing loops and manual code for an operation that can be done in one line.- Links and Tools:
- Highlighting Key Insights with Pandas Styling
The
DataFrame.style
attribute lets you add color scales, highlights, and conditional formatting directly in Jupyter notebooks. This is useful for quickly spotting trends or outliers. Background gradients, highlighting min/max values, and custom CSS can all provide immediate, visual feedback about your data.- Links and Tools:
- Checking for Missing Values
Pandas offers many ways to handle
NaN
and missing data. A particularly quick check is.hasnans
on a Series, letting you decide if you need to drop or impute missing values. Missing data is a central challenge for data scientists, and acknowledging it early can save a lot of time.- Links and Tools:
- MissingNo (external library) – For visualizing missing data.
- Links and Tools:
- Time Series Filtering with
at_time
andbetween_time
If you have a DateTime index, these methods let you filter rows occurring at a specific time or across a time range (e.g., "business hours") in just one line. It’s essential for tasks like slicing out morning data or ignoring weekend time stamps in stock trading data.- Links and Tools:
- Business Date Ranges (
bdate_range
) For time series work,bdate_range
excludes weekends and holidays if desired, giving you a date index pre-filtered for business days. This is crucial when analyzing stock data or any schedule-bound events.- Links and Tools:
- The Speed of Pandas vs. Loops
A recurring lesson is that using built-in pandas methods is much faster than writing loops in plain Python. Pandas uses efficient, C-backed code under the hood, so shifting your mindset to vectorized operations can drastically speed up your workflow.
- Links and Tools:
- NumPy – Underlies much of pandas’ performance benefits.
- Links and Tools:
- Bex’s Tips for Mastering Libraries Beks repeatedly emphasized reading official documentation to discover lesser-known gems. He also points out that contributing to or engaging with communities (like Kaggle or Medium articles) accelerates learning. You need to do more than just skim; deep exploration of the docs, plus real-world application, cements these skills.
Interesting Quotes and Stories
- “You just have to be one step ahead of your audience, and that’s it.” – Beks on writing articles or sharing knowledge even if you’re not an absolute expert.
- “I used to get annoyed seeing these complex functions, so I wrote the article to learn them, and share with the audience.” – Beks explaining his motivation for discovering 25 lesser-known pandas features.
Key Definitions and Terms
- Vectorized Operations: Performing array-wide or column-wide operations in one step rather than looping through each element.
- Categorical Encoding: Turning text labels (categories) into numeric values for machine learning.
- Time-Series Analysis: Working with time-indexed data, often focusing on specialized indexing and filtering.
- Missing Values /
NaN
: Indicators in data that information is not available or not applicable, requiring cleaning or imputation.
Learning Resources
If you want to deepen your Python and data science skills, here are some courses from Talk Python Training.
- Python for Absolute Beginners: Perfect for those just starting their coding journey in Python.
- Move from Excel to Python with Pandas: Ideal if you’re transitioning from spreadsheets to pandas for data manipulation.
- Data Science Jumpstart with 10 Projects: Get hands-on with real-world examples and dive deeper into data science techniques.
Overall Takeaway
This episode serves as a reminder that pandas contains many tools beyond the basics everyone knows. Embracing these lesser-known functions will improve efficiency, clarity, and performance in your data science workflows. By continuously exploring the documentation, writing about what you learn, and engaging with open communities (like Kaggle), you’ll keep discovering new ways to take advantage of Python’s rich data ecosystem. Above all, remember that sometimes “knowing about a feature” is the biggest leap toward more powerful and elegant solutions.
Links from the show
Bex's Medium profile: ibexorigin.medium.com
Numpy 25 functions article: towardsdatascience.com
missingno package: coderzcolumn.com
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm
--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy