Learning and teaching Pandas
Episode Deep Dive
About the Guest: Reuven Lerner
Reuven Lerner is a veteran Python trainer and consultant. He’s the author of Python Workout and the newly released Pandas Workout (published by Manning). He regularly offers corporate training on Python and data analytics with Pandas, works with clients worldwide (often remotely), and stays active in the Python community. Reuven also writes a weekly newsletter called Bamboo Weekly devoted to hands-on Pandas exercises.
Getting Banned from Facebook Ads (Because of “Python” and “Pandas”!)
- Reuven shared a story about having his ad account shut down on Facebook when their automated systems mistakenly flagged “Python” and “Pandas” as exotic animal sales.
- It illustrates some of the strange outcomes you can get from AI-driven moderation and how difficult it can be to appeal these decisions without human intervention.
Pandas Workout Book
- Reuven’s new book, Pandas Workout from Manning, consists of hands-on exercises to help developers improve their Pandas skills step by step.
- It’s modeled after his earlier Python Workout book and follows an exercise-driven approach: each chapter focuses on a specific Pandas topic, offers multiple problems to solve, then provides detailed explanations and answers.
- Many exercises include “Beyond the Exercise” follow-ups that push students into more advanced or less obvious applications of the same topic.
Strategies for Learning Pandas
- Reading data and quick visual wins: Reuven emphasized that showing quick wins (for example, reading a CSV file and plotting it immediately) can hook learners by giving them something visual and satisfying right away.
- Embracing Pandas idioms: Pandas has a different style (“pandonic”) than pure Python. Vectorized operations and built-in methods like
pd.cut()
orexplode()
are usually both faster and more expressive than trying to replicate them with plain Python loops. - Avoiding
inplace=True
: A common misconception is thatinplace=True
is more efficient, but Pandas core devs recommend using the functional approach (returning new objects) for readability and for chaining methods together.
Common Pandas Functionalities and Exercises
- Categorizing numeric data with
pd.cut()
- Example: Splitting taxi ride distances into “short,” “medium,” and “long.”
- Instead of manual if-statements or loops,
pd.cut()
takes numeric bins and labels to create a categorized series (potentially with Pandas’ efficient “category” dtype).
- Outlier Detection with IQR
- Demonstrated using NYC taxi data to see which trips were abnormally long or short.
- Showed how to calculate IQR (interquartile range) and use it to filter out values more than 1.5×IQR from the median range.
- Briefly touched on how statistics like mean vs. median can be swayed by extreme outliers.
- Multiple CSV Files and
concat()
- Combining monthly or yearly taxi data by reading in multiple CSVs and using
pd.concat()
to stack them either vertically or horizontally. - Mentioned a quick tip: use
glob
(from the standard library) or a list comprehension to read many files at once and then callpd.concat()
.
- Combining monthly or yearly taxi data by reading in multiple CSVs and using
- Handling Text and
explode()
- Worked with a wine-review dataset to analyze the most common descriptive words.
- Used
.str.split()
to split text into lists, thenexplode()
to transform each list element into its own row. - Counted frequencies with
value_counts()
, which automatically sorts from most to least frequent.
- Working with Date/Times and Grouping
- Explored tips in NYC taxis by grouping data by month or comparing different periods (pre-pandemic vs. during the pandemic).
- Demonstrated how to create new columns (e.g., tip percentage) and use
groupby()
to compute statistics (mean, count, etc.) on those columns.
- Plotting with Pandas
- Showed how simple it is to get quick bar plots or scatter plots with
DataFrame.plot()
. - Used the example of scatter-plotting U.S. cities by longitude and latitude, effectively creating a map-like visualization with just a few lines of Pandas.
- A quick bar plot example: comparing growth or decline in population among major cities in a single state.
- Showed how simple it is to get quick bar plots or scatter plots with
Pandas Tutor Integration
- Reuven highlighted Pandas Tutor by Philip Guo for visualizing dataframe transformations.
- He included links in his book’s exercises, allowing readers to see step-by-step how Pandas operations (like
groupby()
,cut()
, orexplode()
) transform data behind the scenes.
Bamboo Weekly Newsletter
- Reuven also writes Bamboo Weekly, where he sends out free and paid exercises that draw on real-world data (airport animal transport data, current events, etc.).
- The free edition gives you a taste (first two questions), and the paid plan unlocks the full set of exercises each week.
Overall Takeaway
Learning Pandas effectively involves discovering and practicing its built-in data transformations and idioms, rather than re-implementing logic with pure Python loops. Reuven Lerner’s Pandas Workout and Bamboo Weekly show how short, focused exercises on real datasets can build your intuition for tasks like categorizing data, finding outliers, combining files, or cleaning text. Even if you’re already comfortable with Python, there’s always more to explore within the expansive Pandas ecosystem—especially when it comes to reading, reshaping, and visualizing your data efficiently.
Links from the show
Pandas Workout Book: manning.com
Bamboo Weekly: Solar eclipse: bambooweekly.com
Bamboo Weekly: Avocado hand: bambooweekly.com
Scaling data science across Python and R: talkpython.fm
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm
--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy