Pre-commit Hooks for Python Devs
Episode Deep Dive
Guest Introduction and Background
Stefanie Molin is a software engineer at Bloomberg who comes from a data analytics background, first working in R before moving over to Python. She has authored a second edition of Hands-On Data Analysis with Pandas, contributed significantly to the NumPy doc ecosystem, and has been a frequent conference speaker at events like EuroPython and PyCon. Stefanie’s focus on automated code quality checks led her to develop and share various pre-commit hooks, including a specialized one for NumPy docstrings and another for stripping EXIF metadata from images.
What to Know If You're New to Python
- Check out our course Up and Running with Git: While you don’t need deep Python expertise to use pre-commit hooks, having basic Git and Python knowledge will help you configure them effectively.
Key Points and Takeaways
Why Pre-Commit Hooks Matter
They run checks on your code before it’s committed, saving time by catching formatting or logic issues early. This approach ensures team-wide code consistency and reduces friction in code reviews.Getting Started with Pre-Commit
Installing thepre-commit
Python package is straightforward. Once installed, you add a.pre-commit-config.yaml
file to your repo, and runpre-commit install
to wire everything up.Configuration and Workflow
A typical.pre-commit-config.yaml
specifies the Git repository hosting the hooks, their IDs, and any additional arguments. This file is checked into source control, so everyone on the project shares the same setup.Creating Your Own Pre-Commit Hooks
Stefanie outlined a four-step approach: write your core logic, build a CLI withargparse
or similar, make it installable, and define it in apre-commit-hooks.yaml
. This pattern allows for easy sharing and installation.Automation vs. Checks
If a hook can automatically fix an issue—like adding missing trailing commas—implement that rather than just failing. Reducing busywork keeps developers from bypassing the hooks.EXIF Stripper Example
Stefanie created a hook that removes sensitive or unnecessary metadata from images before they’re committed. This prevents accidental data leaks (GPS coordinates, photographer details) and also reduces file size.Docstring Validation and NumPy Doc
Contributing to scikit-learn sprints introduced Stefanie to the importance of docstring style and consistency. HerNumPy doc
pre-commit hook automatically checks for trailing spaces and adherence to NumPy-style docstrings.Continuous Integration and Pre-Commit
While pre-commit gives immediate local feedback, CI ensures your entire team consistently meets code quality standards. Setting up the same checks in CI builds keeps you from having surprises at merge time.- Example: Combine GitHub Actions with
pre-commit
to run the same hooks in pull requests.
- Example: Combine GitHub Actions with
Interesting Quotes and Stories
- EXIF data realization: "I remember discovering the camera details, location, and the photographer’s name were all in the image. I didn’t want that on my blog!" – Stefanie Molin
- Auto-fix perspective: "If you can fix it, fix it. Don’t just warn about it." – Stefanie Molin on saving developers from repetitive tasks.
Learning Resources
- Stefanie’s Website: stefaniemolin.dev for blog posts, tutorials, and future talks.
- NumPy Doc: numpydoc.readthedocs.io for style guidelines and docstring best practices.
- Pre-commit: Pre-commit official site and GitHub repo.
- Hands-On Data Analysis with Pandas by Stefanie Molin (Second Edition) for in-depth data wrangling techniques.
Overall Takeaway
Implementing and customizing pre-commit hooks in your Python workflow can have a massive impact on code consistency and developer efficiency. From validating docstrings to stripping sensitive metadata, these checks enforce best practices at the earliest stage—saving time, preventing errors, and allowing teams to focus on what truly matters: writing great code.
Links from the show
Talk Python Blog: talkpython.fm/blog
How to Set Up Pre-Commit Hooks: stefaniemolin.com
Common Pre-Commit Errors and How to Solve Them: stefaniemolin.com
A Behind-the-Scenes Look at How Pre-Commit Works: stefaniemolin.com
Pre-Commit Hook Creation Guide: stefaniemolin.com
(Pre-)Commit to Better Code Workshop: stefaniemolin.com
exif-stripper: stefaniemolin.com
exif-stripper on GitHub: github.com
docstring-validation-using-pre-commit-hook: numpydoc.readthedocs.io
Data Morph: Moving Beyond the Datasaurus Dozen: stefaniemolin.com
Data Morph on GitHub: github.com
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm
--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy