Learn Python with Talk Python's Python courses

25 Pandas Functions You Didn’t Know Existed

Episode #341, published Wed, Nov 17, 2021, recorded Thu, Nov 4, 2021.

This episode is carbon neutral.
Do you do anything with Jupyter notebooks? If you do, there is a very good chance you're working with the pandas library. This is one of THE primary tools of anyone doing computational work or data exploration with Python. Yet, this library is massive and knowing the idiomatic way to use it can be hard to discover.

That's why I've invited Bex Tuychiev to be our guest. He wrote an excellent article highlighting 25 idiomatic Pandas functions and properties we should all keep in our data toolkit. I'm sure there is something here for all of us to take away and use pandas that much better.

The 25 functions

  1. ExcelWriter is a generic class for creating excel files (with sheets!) and writing DataFrames to them.
  2. pipe is one of the best functions for doing data cleaning in a concise, compact manner in Pandas
  3. factorize: This function is a pandas alternative to Sklearn’s LabelEncoder
  4. A function with an interesting name is explode.
  5. Another function with a funky name is squeeze and is used in very rare but annoying edge cases.
  6. between: A rather nifty function for boolean indexing numeric features within a range.
  7. All DataFrames have a simple T attribute, which stands for transpose.
  8. Did you know that Pandas allows you to style DataFrames?
  9. Pandas options
  10. convert_dtypes: We all know that pandas has an annoying tendency to mark some columns as object data type. Instead of manually specifying their types, you can use convert_dtypes method which tries to infer the best data type.
  11. A function I use all the time is select_dtypes.
  12. mask allows you to quickly replace cell values where a custom condition is true.
  13. min and max along the columns axis
  14. nlargest and nsmallest.
  15. However, sometimes you want the position of the min/max, you should use idxmax/idxmin
  16. value_counts with dropna=False: common operation to find the percentage of missing values is to chain isnull and sum and divide by the length of the array - you can do the same thing with value_counts with relevant arguments
  17. clip function makes it really easy to find outliers outside a range and replace them with the hard limits.
  18. at_time allows you to subset values at a specific date or time.
  19. bdate_range is a short-hand function to create TimeSeries indices with business-day frequency
  20. autocorr
  21. Pandas offers a quick method to check if a given series contains any nulls with hasnans attribute
  22. at and iat: These two accessors are much faster alternatives to loc and iloc with a disadvantage. They only allow selecting or replacing a single value at a time
  23. argsort: You should use this function when you want to extract the indices that would sort an array
  24. When a column is a category, you can use several special functions using the cat accessor.
  25. GroupBy.nth: This function only works with GroupBy objects. Specifically, after grouping, nth returns the nth row from each group

Links from the show

Bex Tuychiev: linkedin.com
Bex's Medium profile: ibexorigin.medium.com

Numpy 25 functions article: towardsdatascience.com
missingno package: coderzcolumn.com
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe on YouTube: youtube.com
Follow Talk Python on Twitter: @talkpython
Follow Michael on Twitter: @mkennedy

Want to go deeper? Check out our courses

Bex Tuychiev
Bex Tuychiev
Bex is data science content creator from Uzbekistan. He is a top 10 writer on Medium in AI, Machine Learning and Data Science topics. He has written over 100 articles teaching hard data-related topics to thousands of aspiring learners. Bex is also a Kaggle Master, the only one from his country, and is on a fast-track to become the first-ever Kaggle Grandmaster from Central Asia. He is currently working on his new package, Streamlitbook, which will soon be published.
Episode sponsored by
Ads served ethically
Become a friend of the show
Stay in the know and get a chance to win our contests.
See our privacy statement about email communications.