Learn Python with Talk Python's 270 hours of courses

Pre-commit Hooks for Python Devs

Episode #482, published Thu, Oct 24, 2024, recorded Thu, Oct 24, 2024

Do you struggle to make sure your code is always correct before you check it in? What about your team members' code? That one person who never wants to run the linter? Tired of dealing with tons of conflicts and spurious git changes? You need git pre-commit hooks. We're lucky to have Stefanie Molin on this episode who has done a bunch of writing and teaching of git hooks.

Watch this episode on YouTube
Play on YouTube
Watch the live stream version

Episode Deep Dive

Guest Introduction and Background

Stefanie Molin is a software engineer at Bloomberg who comes from a data analytics background, first working in R before moving over to Python. She has authored a second edition of Hands-On Data Analysis with Pandas, contributed significantly to the NumPy doc ecosystem, and has been a frequent conference speaker at events like EuroPython and PyCon. Stefanie’s focus on automated code quality checks led her to develop and share various pre-commit hooks, including a specialized one for NumPy docstrings and another for stripping EXIF metadata from images.


What to Know If You're New to Python

  • Check out our course Up and Running with Git: While you don’t need deep Python expertise to use pre-commit hooks, having basic Git and Python knowledge will help you configure them effectively.

Key Points and Takeaways

  1. Why Pre-Commit Hooks Matter
    They run checks on your code before it’s committed, saving time by catching formatting or logic issues early. This approach ensures team-wide code consistency and reduces friction in code reviews.

  2. Getting Started with Pre-Commit
    Installing the pre-commit Python package is straightforward. Once installed, you add a .pre-commit-config.yaml file to your repo, and run pre-commit install to wire everything up.

  3. Configuration and Workflow
    A typical .pre-commit-config.yaml specifies the Git repository hosting the hooks, their IDs, and any additional arguments. This file is checked into source control, so everyone on the project shares the same setup.

    • Example tools mentioned:
  4. Creating Your Own Pre-Commit Hooks
    Stefanie outlined a four-step approach: write your core logic, build a CLI with argparse or similar, make it installable, and define it in a pre-commit-hooks.yaml. This pattern allows for easy sharing and installation.

  5. Automation vs. Checks
    If a hook can automatically fix an issue—like adding missing trailing commas—implement that rather than just failing. Reducing busywork keeps developers from bypassing the hooks.

    • Tools with auto-fix features:
  6. EXIF Stripper Example
    Stefanie created a hook that removes sensitive or unnecessary metadata from images before they’re committed. This prevents accidental data leaks (GPS coordinates, photographer details) and also reduces file size.

  7. Docstring Validation and NumPy Doc
    Contributing to scikit-learn sprints introduced Stefanie to the importance of docstring style and consistency. Her NumPy doc pre-commit hook automatically checks for trailing spaces and adherence to NumPy-style docstrings.

  8. Continuous Integration and Pre-Commit
    While pre-commit gives immediate local feedback, CI ensures your entire team consistently meets code quality standards. Setting up the same checks in CI builds keeps you from having surprises at merge time.

    • Example: Combine GitHub Actions with pre-commit to run the same hooks in pull requests.

Interesting Quotes and Stories

  • EXIF data realization: "I remember discovering the camera details, location, and the photographer’s name were all in the image. I didn’t want that on my blog!" – Stefanie Molin
  • Auto-fix perspective: "If you can fix it, fix it. Don’t just warn about it." – Stefanie Molin on saving developers from repetitive tasks.

Learning Resources


Overall Takeaway

Implementing and customizing pre-commit hooks in your Python workflow can have a massive impact on code consistency and developer efficiency. From validating docstrings to stripping sensitive metadata, these checks enforce best practices at the earliest stage—saving time, preventing errors, and allowing teams to focus on what truly matters: writing great code.

Links from the show

Stefanie Molin: stefaniemolin.com

Talk Python Blog: talkpython.fm/blog

How to Set Up Pre-Commit Hooks: stefaniemolin.com
Common Pre-Commit Errors and How to Solve Them: stefaniemolin.com
A Behind-the-Scenes Look at How Pre-Commit Works: stefaniemolin.com
Pre-Commit Hook Creation Guide: stefaniemolin.com
(Pre-)Commit to Better Code Workshop: stefaniemolin.com
exif-stripper: stefaniemolin.com
exif-stripper on GitHub: github.com
docstring-validation-using-pre-commit-hook: numpydoc.readthedocs.io
Data Morph: Moving Beyond the Datasaurus Dozen: stefaniemolin.com
Data Morph on GitHub: github.com
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Talk Python's Mastodon Michael Kennedy's Mastodon