Learn Python with Talk Python's 270 hours of courses

Learning and teaching Pandas

Episode #471, published Mon, Jul 22, 2024, recorded Sun, Jul 7, 2024

If you want to get better at something, often times the path is pretty clear. If you get better at swimming, you go to the pool and practice your strokes and put in time doing the laps. If you want to get better at mountain biking, hit the trails and work on drills focusing on different aspects of riding. You can do the same for programming. Reuven Lerner is back on the podcast to talk about his book Pandas Workout. We dive into strategies for learning Pandas and Python as well as some of his workout exercises.

Watch this episode on YouTube
Play on YouTube
Watch the live stream version

Episode Deep Dive

About the Guest: Reuven Lerner

Reuven Lerner is a veteran Python trainer and consultant. He’s the author of Python Workout and the newly released Pandas Workout (published by Manning). He regularly offers corporate training on Python and data analytics with Pandas, works with clients worldwide (often remotely), and stays active in the Python community. Reuven also writes a weekly newsletter called Bamboo Weekly devoted to hands-on Pandas exercises.

Getting Banned from Facebook Ads (Because of “Python” and “Pandas”!)

  • Reuven shared a story about having his ad account shut down on Facebook when their automated systems mistakenly flagged “Python” and “Pandas” as exotic animal sales.
  • It illustrates some of the strange outcomes you can get from AI-driven moderation and how difficult it can be to appeal these decisions without human intervention.

Pandas Workout Book

  • Reuven’s new book, Pandas Workout from Manning, consists of hands-on exercises to help developers improve their Pandas skills step by step.
  • It’s modeled after his earlier Python Workout book and follows an exercise-driven approach: each chapter focuses on a specific Pandas topic, offers multiple problems to solve, then provides detailed explanations and answers.
  • Many exercises include “Beyond the Exercise” follow-ups that push students into more advanced or less obvious applications of the same topic.

Strategies for Learning Pandas

  • Reading data and quick visual wins: Reuven emphasized that showing quick wins (for example, reading a CSV file and plotting it immediately) can hook learners by giving them something visual and satisfying right away.
  • Embracing Pandas idioms: Pandas has a different style (“pandonic”) than pure Python. Vectorized operations and built-in methods like pd.cut() or explode() are usually both faster and more expressive than trying to replicate them with plain Python loops.
  • Avoiding inplace=True: A common misconception is that inplace=True is more efficient, but Pandas core devs recommend using the functional approach (returning new objects) for readability and for chaining methods together.

Common Pandas Functionalities and Exercises

  1. Categorizing numeric data with pd.cut()
    • Example: Splitting taxi ride distances into “short,” “medium,” and “long.”
    • Instead of manual if-statements or loops, pd.cut() takes numeric bins and labels to create a categorized series (potentially with Pandas’ efficient “category” dtype).
  2. Outlier Detection with IQR
    • Demonstrated using NYC taxi data to see which trips were abnormally long or short.
    • Showed how to calculate IQR (interquartile range) and use it to filter out values more than 1.5×IQR from the median range.
    • Briefly touched on how statistics like mean vs. median can be swayed by extreme outliers.
  3. Multiple CSV Files and concat()
    • Combining monthly or yearly taxi data by reading in multiple CSVs and using pd.concat() to stack them either vertically or horizontally.
    • Mentioned a quick tip: use glob (from the standard library) or a list comprehension to read many files at once and then call pd.concat().
  4. Handling Text and explode()
    • Worked with a wine-review dataset to analyze the most common descriptive words.
    • Used .str.split() to split text into lists, then explode() to transform each list element into its own row.
    • Counted frequencies with value_counts(), which automatically sorts from most to least frequent.
  5. Working with Date/Times and Grouping
    • Explored tips in NYC taxis by grouping data by month or comparing different periods (pre-pandemic vs. during the pandemic).
    • Demonstrated how to create new columns (e.g., tip percentage) and use groupby() to compute statistics (mean, count, etc.) on those columns.
  6. Plotting with Pandas
    • Showed how simple it is to get quick bar plots or scatter plots with DataFrame.plot().
    • Used the example of scatter-plotting U.S. cities by longitude and latitude, effectively creating a map-like visualization with just a few lines of Pandas.
    • A quick bar plot example: comparing growth or decline in population among major cities in a single state.

Pandas Tutor Integration

  • Reuven highlighted Pandas Tutor by Philip Guo for visualizing dataframe transformations.
  • He included links in his book’s exercises, allowing readers to see step-by-step how Pandas operations (like groupby(), cut(), or explode()) transform data behind the scenes.

Bamboo Weekly Newsletter

  • Reuven also writes Bamboo Weekly, where he sends out free and paid exercises that draw on real-world data (airport animal transport data, current events, etc.).
  • The free edition gives you a taste (first two questions), and the paid plan unlocks the full set of exercises each week.

Overall Takeaway

Learning Pandas effectively involves discovering and practicing its built-in data transformations and idioms, rather than re-implementing logic with pure Python loops. Reuven Lerner’s Pandas Workout and Bamboo Weekly show how short, focused exercises on real datasets can build your intuition for tasks like categorizing data, finding outliers, combining files, or cleaning text. Even if you’re already comfortable with Python, there’s always more to explore within the expansive Pandas ecosystem—especially when it comes to reading, reshaping, and visualizing your data efficiently.

Links from the show

Reuven Lerner on Twitter: @reuvenmlerner
Pandas Workout Book: manning.com
Bamboo Weekly: Solar eclipse: bambooweekly.com
Bamboo Weekly: Avocado hand: bambooweekly.com
Scaling data science across Python and R: talkpython.fm
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Talk Python's Mastodon Michael Kennedy's Mastodon