pre-commit framework

Episode #282, published Thu, Sep 17, 2020, recorded Tue, Jul 21, 2020

Episode Deep Dive Links Transcript

Git hook scripts are useful for identifying simple issues before committing your code. Hooks run on every commit to automatically point out issues in code such as trailing whitespace and debug statements. By pointing these issues out before code review, this allows a code reviewer to focus on the architecture of a change while not wasting time with trivial style nitpicks.

As we created more libraries and projects we recognized that sharing our pre-commit hooks across projects is painful. That's why I'm happy to welcome Anthony Sottile to the show to discuss pre-commit, a framework for managing and maintaining multi-language pre-commit hooks.

Episode Deep Dive

Guest Introduction and Background

Anthony Sottile is a seasoned Python developer well-known for his contributions to Python's open-source ecosystem. He has worked at prominent companies such as Yelp and Lyft, focusing on developer tooling, infrastructure, and continuous integration. Anthony maintains or contributes to several impactful projects (including Flake8) and has a deep interest in automating and simplifying developers' workflows. His passion for tooling led him to create the pre-commit framework, which helps developers seamlessly adopt and share Git hooks across projects.

What to Know If You're New to Python

If you're new to Python and want to get the most from this episode, having some basic familiarity with Git and simple command-line tools is essential:

Understanding core Python concepts like installing packages with pip will help you follow the workflow.
Knowing how to set up and activate a virtual environment is very helpful.
Having a grasp of how Git works (e.g., commits, pushing, pulling) is important when discussing pre-commit hooks.

Key Points and Takeaways

1) The Purpose and Power of Pre-Commit Hooks Pre-commit hooks are scripts that run automatically before you finalize a Git commit. They help catch and fix issues such as style problems, debug statements, or even leaked credentials before those changes ever leave your machine. This episode shows how having checks run locally first is more efficient than waiting for code reviews or continuous integration pipelines to catch trivial mistakes.
- Links and tools:
  - Pre-commit framework
  - Git hooks docs
2) Pre-commit as a Framework, Not Just a Script Traditionally, you might see large Bash scripts in Git repositories that try to handle linting or format enforcement. The pre-commit framework takes this further by creating an ecosystem of "repos" containing hooks, plus an easy installation and versioning model. It encourages best practices like pinned versions of linters and consistent setups across teams.
- Links and tools:
  - pre-commit GitHub repo
3) Automatic Tool Installation across Multiple Languages One of the biggest selling points is that pre-commit can auto-install required runtimes and tools—even if they are written in Ruby, Node.js, or other languages. It ensures a consistent environment for every developer without them having to manage multiple global installations. This also streamlines onboarding for new team members.
- Links and tools:
  - RVM and rbenv (for Ruby)
  - nvm (for Node.js)
4) Integration with CI Systems While pre-commit primarily runs locally, adding it to continuous integration (CI) ensures that no one bypasses these checks or forgets to run them. It also gives a single source of truth, catching issues even if a developer didn’t install pre-commit themselves. Anthony highlighted that the recommended pattern is to run pre-commit run --all-files in CI, and advanced caching can speed up repeated runs.
- Links and tools:
5) Preventing Leaked Secrets and Other Simple Mistakes Pre-commit hooks can scan code for things like AWS credentials or debug statements. These checks reduce embarrassing or harmful mistakes. By catching them automatically and locally, you reduce the risk of exposing credentials or leaving stray print or import pdb statements in production.
- Links and tools:
  - Bandit for security checks
  - SCSS-lint (Ruby-based for CSS)
  - Detect AWS Credentials (built-in hook)
6) Distributed and Configurable Model Developers can opt into which hooks they want to run by editing a simple YAML file in their repo. The approach is distributed, so any Git repository can store and publish hooks. It avoids monolithic, hard-to-maintain scripts and fosters sharing across internal teams or the broader open-source community.
- Links and tools:
  - Pre-commit sample config docs
7) Popular Hooks and Linters The conversation highlights key linters like Flake8 and Black. There’s also mention of specialized hooks (e.g., forbid submodules, detect submodule changes, reorder Python imports) and the ability to run advanced code security checks (e.g., Bandit). Auto-formatters can fix minor style issues without developer intervention.
- Links and tools:
8) Hooking into Non-Python Tools Pre-commit is not Python-specific. It supports Docker-based hooks, Node-based hooks, Ruby-based hooks, and more. This flexibility allows teams with polyglot setups to unify their validations in a single pipeline, bridging front-end and back-end.
- Links and tools:
  - Docker
  - NVM for Node
9) How to Get Started in a Few Steps Anthony described a straightforward workflow: install pre-commit via pip install pre-commit, create a .pre-commit-config.yaml, run pre-commit install, and optionally do a full project check with pre-commit run --all-files. This flow makes adoption simple and consistent for teams.
- Links and tools:
  - Installation docs
10) Avoiding Manual Code-Style Debates A crucial benefit of automated hooking is removing friction in code reviews. Instead of developers nitpicking style or formatting, the tools handle it. This frees reviewers to focus on actual logic and architecture, improving code quality and team morale.
- Links and tools:
  - Black auto-formatter
  - Flake8 config options
11) Sharing Configurations Across Teams and Projects Because pre-commit pins its hooks to specific versions, entire teams can stay synced on the same tool versions. This is especially beneficial for large organizations or open-source projects, preventing version drift that can cause subtle failures or inconsistent code styles.
- Links and tools:
  - Versioning docs in pre-commit
12) Personal Developer Journeys and Custom Tools Anthony shared that he started as a front-end developer, moved to backend infrastructure, and gravitated to advanced developer tooling. He also wrote his own text editor, Babby, a Python-based, curses-driven editor. This underscores his hands-on approach to building solutions for everyday developer problems.
- Links and tools:
  - Babby on GitHub

Interesting Quotes and Stories

"If I’m waiting till CI to get feedback on nitpicks around commas or whitespace, that’s way too late in the process." — Anthony Sottile

"The absolute worst situation is that a human tells you something’s wrong. The next worst is that CI does. Better is a local tool. The golden standard is that an automated tool just fixes it for me." — Anthony Sottile

"By pointing out issues before code review, you let reviewers focus on the architecture and not these trivial style nitpicks." — Michael Kennedy

Key Definitions and Terms

Git Hooks: Scripts that Git runs before or after certain events, like commits or pushes.
Pre-Commit: The specific Git hook that triggers right before your commit finalizes.
Linting: Automated checking of source code for errors, style violations, or other “smells.”
Formatter: A tool that automatically reformats code to adhere to a prescribed style.
CI (Continuous Integration): Automated systems that build and test code whenever changes are pushed.

Learning Resources

If you’re looking to solidify your Git workflow or Python fundamentals:

Up and Running with Git: A practical guide to Git for teams and solo developers.
Python for Absolute Beginners: Get a solid foundation in Python to take full advantage of tools like pre-commit.

Overall Takeaway

Pre-commit hooks help teams catch errors and format code before it ever leaves a developer’s local environment. Adopting a simple framework like pre-commit saves time in code reviews, ensures consistent tool versions, and frees developers from tedious style debates. By automating these checks, the conversation can move away from formatting details and focus on the real substance and innovation in your projects.

Links from the show

Anthony at Twitter: @codewithanthony
pre-commit: pre-commit.com
pre-commit continuous integration: pre-commit.ci
pre-commit hooks: pre-commit.com/hooks.html
pre-commit on GitHub: github.com
shhgit secret discovery project: github.com
babi editor: github.com
Twitch stream: twitch.tv

Anthony on GitHub: github.com
Episode #282 deep-dive: talkpython.fm/282
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #282 deep-dive: talkpython.fm/282

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Git hook scripts are useful for identifying simple issues before committing your code.

00:04 Hooks run on every commit to automatically point out issues in your code,

00:08 such as trailing white space and debug statements.

00:11 By pointing out these issues before you get to a code review, this allows the code reviewer to focus on the architecture of a change

00:18 while not wasting time with trivial style nitpicks.

00:21 As we create more libraries and projects, we recognize that sharing pre-commit hooks across projects is painful.

00:29 That's why I'm happy to welcome Anthony Sotili to the show to discuss pre-commit,

00:33 a framework for managing and maintaining multi-language pre-commit hooks.

00:38 This is Talk Python To Me, episode 282, recorded July 1st, 2020.

00:57 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

01:03 This is your host, Michael Kennedy.

01:05 Follow me on Twitter where I'm @mkennedy.

01:07 Keep up with the show and listen to past episodes at talkpython.fm.

01:11 And follow the show on Twitter via at Talk Python.

01:13 This episode is brought to you by Brilliant.org and us.

01:18 Python's async and parallel programming support is highly underrated.

01:22 Have you shied away from the amazing new async and await keywords because you've heard it's way too complicated or that it's just not worth the effort?

01:29 With the right workloads, a hundred times speed up is totally possible with minor changes to your code.

01:35 But you do need to understand the internals.

01:37 And that's why our course, Async Techniques and Examples in Python, show you how to write async code successfully as well as how it works.

01:45 Get started with async and await today with our course at talkpython.fm/async.

01:51 Anthony, welcome to Talk Python To Me.

01:54 Yeah, glad to be here.

01:55 It's great to have you here.

01:56 You and I have been on together on Python Bytes along with Brian.

02:00 We had you over there once, I believe, but never on Talk Python.

02:04 So welcome to the show.

02:05 Really happy to have you around.

02:06 And you have this cool project that we're going to talk about.

02:09 And it's kind of like the old thing is new again, right?

02:12 It's been around for a little while, but it's some of the other stuff that seems to have boosted its popularity.

02:17 And to me, it seems like a fantastic idea with pre-commit hooks and a framework for running them and building them.

02:23 So I'm excited to dig into it with you.

02:24 Sounds great.

02:25 Happy to talk about it.

02:26 Yeah.

02:27 Before we do, though, let's get to your story.

02:28 How did you get into programming in Python?

02:29 My first Python programming was actually at Yelp, where I worked as a software engineer.

02:35 I was hired as a front-end developer and mostly did CSS and JavaScript.

02:40 But quickly realized that in order to unblock my own job, it made a lot more sense for me to learn the back-end and write my own APIs instead of waiting for a back-end engineer to implement the same APIs.

02:52 So I picked up a bunch of Python and made my job a lot easier in that way.

02:56 Nice.

02:56 What did you use on the back-end?

02:57 Was that Flask, Django, something else?

03:00 So unfortunately, Yelp was created before there were a lot of these larger frameworks.

03:06 And so Yelp had kind of their own homegrown framework.

03:10 I eventually shifted from front-end to full-stack to back-end to infrastructure.

03:15 And part of the infrastructure work that we did there was to move the monolithic code base to be based on Pyramid.

03:22 So kind of like the Pyramid pylon sort of approach.

03:25 And it was actually really nice because we could basically cut out parts of the proprietary Yelp main code and stick in bits of Pyramid so that we could offload some of the logic to open source software.

03:37 Oh, nice.

03:38 Yeah, that's really cool.

03:39 Most of my sites are built in Pyramid as well.

03:40 I like that framework.

03:41 It gives you a ton of customization.

03:43 It's a little bit daunting to start with like small projects.

03:46 I usually prefer to use like Flask or whatever.

03:48 Yeah.

03:48 But it definitely lets you do everything.

03:50 Yeah.

03:50 Well, if I'm looking at new things that look really sweet, FastAPI is looking sweet.

03:55 Yeah, I've heard some good things.

03:56 I haven't had a chance to try it out yet, though.

03:57 You know what I really like about it is you can define models, classes in Pydantic, and then say your view method takes one of those models.

04:06 If you do get receiving like a JSON post and it'll do validated pre-population of all the fields using the Pydantic validation, which I'm just like, oh, that's just saved me like two thirds of my whole app right now.

04:16 That's awesome.

04:16 Yeah, that's pretty good.

04:17 Yeah, I feel like I spend most of my time in validation, input validation.

04:20 So if it's...

04:21 Exactly.

04:21 Or converting.

04:22 I know it's a string, but then we're going to cast it to an int if it can be casted to an int.

04:25 All that kind of stuff.

04:26 Like, yeah, it's all built in.

04:27 So anyway, if I go build some new APIs, that's not just part of the main app.

04:32 I probably would do some FastAPI stuff, but that's pretty new.

04:34 It's only been around a year and a half.

04:36 Nice.

04:36 Awesome.

04:37 So you moved from the front end stuff over to the back end at Yelp, and it sounds like you got to do some really cool transformation, sort of empowering things there.

04:45 What are you doing now?

04:46 Well, technically, I'm currently unemployed, but my current passion is developer tooling and kind of infrastructure in that space.

04:55 I was recently at Lyft where I led the developer experience organization and, you know, building tools and infrastructure that make developers productive.

05:04 The idea being like, I can invest some in tooling and give that to all developers and then they can get their jobs done faster and better and safer and etc.

05:14 Yeah.

05:14 Awesome.

05:15 Are you looking for that consulting stuff?

05:16 People are out there or are you got plans already?

05:20 So right now, right now I'm trying to build my own project and see where it goes.

05:24 But I've kind of time boxed myself to like six to nine months.

05:28 And if it doesn't turn out well, when it doesn't turn out, I'll be looking for new employment after that.

05:33 It might turn out.

05:34 It might turn out.

05:35 You're a good programmer.

05:35 I know you can do it.

05:36 Awesome.

05:37 There's always a chance.

05:38 Yeah, for sure.

05:39 You got to give it a shot.

05:40 That's awesome.

05:40 Well, let's talk about this project that you've been working on for a while.

05:43 But like I said, has gotten a little bit of traction, a lot of traction lately because of tools like Black and other things that have made pre-commit hooks awesome and exciting all of a sudden.

05:55 But before we talk about what you've been doing, let's just talk about the idea of pre-commit hooks in general.

06:00 Sure.

06:00 What is this for a lot of people who are like, yeah, I kind of know what Git is.

06:04 I kind of use that or maybe have even used like zip as a zip and name of a date as a source control.

06:10 Final one, one, one.

06:12 But yeah, the idea behind Git hooks and specifically the pre-commit hook, because I think that's probably the one that most people get the most interaction with.

06:20 But there are a bunch of commands in Git where you can register callbacks as scripts to either do like validation or like seeing some people use it to like send emails or like close tickets, all sorts of other stuff.

06:33 The main focus around Git hooks to me is the pre-commit hook where you can do like linting and code validation, code formatting.

06:43 You can run tests or other stuff like that.

06:45 I guess the pre-push hook is another one that's also kind of big in that same space where you want to do validation of your changes before you send them off to like a...

06:54 Right. So you could like maybe reject some kind of reject the Git push if the formatting is wrong or the header is missing or something like that, right?

07:03 Yep. You can take a lot of those like easy to validate things and do them in a kind of a quick fast manner before you would do your larger test suite or something or catch a syntax error before you spend a bunch of time spinning up your CI systems.

07:18 Right. Well, speaking of CI, to me, this seems like the next natural progression from having CI do these tests, right?

07:27 So there's different levels. The developer should probably be writing and running tests and making sure that the tests pass.

07:33 They should be like formatting their code before they check it in and stuff like that.

07:37 But when you work on a team, my experience has been there's a wide, wide range of how much people are willing to do that, how much they care about those kinds of things.

07:48 And what that means is maybe you have CI continuous integration that runs automatically during check-in or once check-in is done.

07:55 And so then they might check in something and might say, oh, the build is now broken because you didn't bother to run the test, but you broke the test.

08:03 But because you didn't run them, you didn't know it.

08:05 So, you know, a tree falls in a forest. No one hears it, right?

08:07 That sort of thing.

08:10 And so then you end up in the situation, the people that care about the build working have to track down the person who broke it, who didn't actually care.

08:17 Like, it's just these layers of like annoying type of thing.

08:19 Yeah.

08:20 And if you can make the sort of push that validation to the location where the person is and all the people, right?

08:26 So even if you care, like you might not want to break the build, you might rather just get a warning or just automatically have it fixed.

08:32 And pre-commit hooks seem like that's the natural place for that.

08:35 Yeah. Kind of the mentality that I always had is like, if I'm waiting till CI to get feedback on like nitpicks around commas or white space or syntax or whatever, that to me is way too late in the process.

08:49 Because I've already like, you know, I've pushed, I've already gone off to the next thing.

08:53 I'm already answering my email or looking at GitHub issues or talking in Slack or whatever.

08:59 And like, I've already context switched to a completely different situation.

09:02 I'm like, I could have been, I assumed it was good because I already committed and pushed, right?

09:07 I never make mistakes.

09:08 So yeah, exactly.

09:09 Zero fault code.

09:10 Git push is like the final action.

09:12 Now you're done, right?

09:14 Yeah.

09:15 But like, it was incredibly frustrating to like make a push and then have some either build system telling me that something was wrong or in code review, someone was like, oh, well, you could have reordered these imports.

09:26 So they're alphabetical or something.

09:27 It's just like, yeah, this is a big waste of time.

09:30 Let's kind of push this as far towards the developer as possible such that we can make that a better, a better situation.

09:36 The other thing I think is interesting around these ideas is there's been studies that have shown that people are more willing to take nitpicky advice from a computer than from a human.

09:47 There's like, oh, okay, well, the computer requires that I have this kind of white space or this kind of indentation or like this type of fort, like you said, order alphabetical ordering or whatever.

09:56 And when it comes from a person in a code review, it's like, well, that person is just a jerk, right?

10:00 I wrote good code.

10:01 Here I am taking this flack for this thing.

10:03 So having this happen like automatically, I think takes away the need to review that kind of stuff.

10:09 It takes away the need to complain and be that grumpy person that does that.

10:14 And you can even go farther with tools like Black, where it doesn't just complain.

10:18 It just goes, I fixed it for you.

10:20 Yeah, actually, the way that I usually talk about this is like the absolute, the worst situation is that a human tells you that something is wrong.

10:26 The next worst situation is that like a CI system tells you that something's wrong.

10:30 Better than that is that an automated local tool tells you that something is wrong.

10:35 And the like golden standard is that an automated tool just fixes it for me.

10:39 Like I don't, don't have to worry about it all.

10:41 It just makes it happen.

10:42 Yeah.

10:42 And it also results, you don't have this sort of like dueling alternate format style, right?

10:49 Like if I like to work in PyCharm, you like to work in VS Code and our formatting rules vary ever so slightly, you know, like commas between parameters or there are two spaces between the colon on a type annotation.

11:01 You know, if we both keep reformatting document, they cycle back and forth.

11:04 Right.

11:05 And this way you can sort of just hit it with the same formatter right before it goes every time.

11:09 Of course.

11:09 Yeah.

11:10 And that also helps with things like get blame or like noticing when patches are minimal or whatever.

11:15 And like being able to triage a change way down the road, not be distracted by two different editors reformatting in a different way.

11:26 This portion of Talk Python To Me is brought to you by Brilliant.org.

11:29 Brilliant has digestible courses in topics from the basics of scientific thinking all the way up to high-end science like quantum computing.

11:36 And while quantum computing may sound complicated, Brilliant makes complex learning uncomplicated and fun.

11:42 It's super easy to get started and they've got so many science and math courses to choose from.

11:47 I recently used Brilliant to get into rocket science for an upcoming episode and it was a blast.

11:51 The interactive courses are presented in a clean and accessible way and you could go from knowing nothing about a topic to having a deep understanding.

11:59 Put your spare time to good use and hugely improve your critical thinking skills.

12:03 Go to talkpython.fm/brilliant and sign up for free.

12:07 The first 200 people that use that link get 20% off the premium subscription.

12:12 That's talkpython.fm/brilliant or just click the link in the show notes.

12:17 So that brings us to your project, pre-commit.com, which to me, I thought this was kind of something new because I hadn't heard of it before.

12:28 And I feel like with all the excitement around Black being one of these auto-formatters that we just discussed, some of the other tooling around get pre-commit hooks,

12:38 that this was like this big new thing, but we were talking and you told me it's been around for a little while and it's just gaining a lot of momentum now.

12:44 Yeah.

12:44 The original project actually came out of my want to enforce my partners in group projects in college to make sure that their white space was nice and that formatting was good.

12:57 But it was originally just like a several hundred line Python script pre-commit.py way back in 2012.

13:04 The first public version of pre-commit, I believe, was May of 2014, which means that it just turned six years old a little while ago.

13:13 And for the most part, the framework's idea has remained basically the same since the beginning.

13:19 Most of what's happened since 2014 has been a lot of bug fixes, a lot more platform support.

13:26 So like making things work well on Windows and other exotic platforms like macOS, which is basically the Wild West, right?

13:34 These uncommon.

13:35 Yeah, exactly.

13:37 Exactly.

13:38 But also like support for other programming languages.

13:40 So even though pre-commit is written in Python, it aims to target a bunch of different programming languages.

13:45 It can both lint and run tools in, I think it's like nine or 10 or 11 different programming languages now and continues to support more going forward.

13:55 It's pretty easy to add support for another language.

13:58 Yeah, very cool.

13:59 So the idea is this is a framework for creating pre-commit hooks and some tooling to install and initialize some of the various plugins.

14:08 And one of the things that's like you just covered is pretty unique to it, I think, is that it happens to be implemented in Python, but it'll let you both install linters that require different runtimes like Ruby or Java.

14:20 I'm not sure about Java.

14:21 I don't remember.

14:22 It doesn't quite have Java support yet, but it's in the works.

14:25 But the cool thing about it is that it aims to make it so you don't have to set up anything locally.

14:30 You just install pre-commit, you have a configuration file, and it manages installing and running all those tools for you.

14:38 So you don't have to worry about, like as a sysadmin, you don't have to worry about distributing Ruby to your machines or like installing some weird Node.js package on your system or like maintaining the version of that or updating and downgrading or whatever you need to do with that.

14:53 And like pre-commit will just install those tools for you.

14:56 Right.

14:57 And there's a lot of tools that we may care about, even if we're just Python developers that are not in Python.

15:02 For example, we might be creating a Pyramid web app that has a SCSS, like a less SAS type of programming language for CSS.

15:12 There's a nice linter in Ruby, but we got to have Ruby in order to do that, right?

15:16 Yeah.

15:16 Unfortunately, pre-commit makes setting up SCSS lint in particular, if that's the one you're referring to, makes that pretty easy to set up and go.

15:24 You just add a little configuration and it will set up a Ruby environment, even if you don't have Ruby installed on your machine, and install SCSS lint and make that easy to go.

15:34 That's pretty cool.

15:35 The weird thing about SCSS lint, though, is it's kind of stopped development as SCSS has moved from either the Ruby implementation to the C++ implementation or the Dart implementation.

15:47 And there are actually linters for SCSS in JavaScript as well, which is another popular programming language that pre-commit supports.

15:55 And a lot of people have been moving to those other ones as well.

15:58 But again, like pre-commit will just install that and make that easy to run for you.

16:02 So how's that work? Talk me through the, I don't have Ruby on my machine. I want to run some tool as a pre-commit hook that needs the Ruby runtime. Where's the magic?

16:12 Yeah, so I actually think that's probably the biggest selling point of pre-commit is all of the language smarts where it knows how to install and run things.

16:21 But let's talk about SCSS lint specifically. So what you would do is you would install pre-commit on your machine.

16:27 And there's directions in like the quick start guide on pre-commit.com.

16:31 You would set up a configuration file, a small YAML file, which points at some repositories which provide hooks.

16:38 In this case for SCSS lint, I believe it's pre-commit slash mirrors SCSS lint.

16:43 There are some repositories where we have to mirror them because the upstreams are either dead or, or like, don't want to add a small bit of metadata to their repository.

16:52 And once you have the configuration file, you can run pre-commit run or pre-commit install to make it part of your Git hooks.

16:59 And pre-commit run will, in the case of Ruby, it uses several different approaches to try to set up a Ruby environment.

17:07 So the first thing it'll attempt is if you already have Ruby installed, it'll just kind of skip all of the environment setup and it'll just reuse the Ruby that you have.

17:15 But it'll install SCSS lint in an isolated fashion.

17:19 So it tries to stay away from everything else that's running on your system.

17:22 Basically, the idea is like pre-commit provides virtual environments, but for every other program.

17:26 Right, right.

17:27 That's cool.

17:28 And when you don't have Ruby, it will either download it using RVM or RB env, which I love.

17:36 I love the Ruby tooling landscape.

17:38 It's like impossible to talk about either of the tools because they both sound the same if you slur your words together like I do sometimes.

17:44 Yeah.

17:45 But RVM provides a bunch of pre-built binaries for a bunch of platforms.

17:50 So it makes it easy to just like download and run Ruby.

17:52 And R-B-E-N-V, the other tool, makes it really easy to build Ruby from source.

17:58 And so pre-commit uses both of those tools to provide Ruby environments.

18:02 I see.

18:02 So there's this background thing that just, if you don't have the right runtime, it'll go and get it.

18:08 Yep, it'll make it happen.

18:09 Yeah.

18:09 And there's a similar tool called NodeM or JavaScript that pre-commit leans on as well to set up those environments.

18:15 NodeM is kind of a mesh between the ideas of like PyEnv and VirtualEnv.

18:21 It actually has some like direct integration with VirtualEnv, but it basically allows you to go from nothing to a working JavaScript environment without having to install anything on your system.

18:31 Nice.

18:32 Yeah, yeah.

18:33 That's super cool.

18:33 I do think that that's the magic, right?

18:35 You can use all these different pre-commit hooks without a setup beyond just pip install or pipx install pre-commit.

18:43 By the way, does pipx make sense?

18:45 Are you familiar with pipx?

18:46 Yeah, I'm familiar with pipx.

18:47 I think it works for pre-commit.

18:49 I haven't tried it myself, mostly because I manage stuff with virtual environments, but it should work.

18:54 There's nothing super special about it.

18:55 Yeah, it seems like it would as well to me.

18:58 Yeah, yeah.

18:58 So you install it and then that's all you got to worry about.

19:01 You don't have to worry about having Ruby or Node or how about even Python?

19:05 You do need Python.

19:06 That's the one dependency.

19:08 Right, because it's implemented in Python, right?

19:10 Yeah.

19:11 There's plans to build a, what is it called?

19:14 Py installer?

19:15 Is that the one?

19:16 I don't know.

19:16 There's a bunch of different tools.

19:17 Yeah, yeah.

19:18 Py installer or PyToApp or yeah.

19:20 Like one of those or CXFreeze.

19:22 So you've been thinking about potentially like turn it into one of these raw executables?

19:27 Yeah.

19:27 Unfortunately, I know almost nothing about them.

19:29 So I haven't had a chance to look into that.

19:32 But it should be pretty straightforward to make that happen.

19:34 Yeah, that'd be cool.

19:35 But yeah, one other thing before we move on, like I think the two, there's a bunch of these

19:40 GitHub frameworks.

19:41 And like, I think pre-emit is pretty unique in this installs the tools for you sort of aspect.

19:46 I think like a lot of these other frameworks make two main mistakes.

19:49 One of them is like, well, the first mistake that everyone seems to make is like you check

19:54 in this 2000 blind bash script and that's how you manage your GitHub.

19:58 And like anytime someone needs to change that, they've got to dive into this terrible script

20:03 that always breaks and makes people's workflows really frustrating, which is actually what we

20:07 had at Yelp before Yelp moved to pre-commit.

20:10 But the other thing is managing tools.

20:13 I find that a lot of shops will install these linters globally and their get hooks will assume

20:21 that they're installed globally.

20:23 And you instantly have like three different problems from that.

20:27 One is version drift, both that all of your developers will need to like make sure that

20:32 they're on a specific version.

20:33 The other is you need environment setups.

20:36 You can't just like switch from your laptop to your personal machine or, you know, maybe

20:42 I'm working on a netbook in a coffee shop or whatever.

20:44 You would need to make sure that you have your very specific global environment setup.

20:48 Right.

20:49 That can always kind of drift and be painful.

20:51 And the third thing is if tools are managed globally, upgrading is impossible because I have

20:59 an old version of the code checked out and a new version of linter is now on the machine.

21:04 And so my code doesn't pass the new linter and the new code doesn't pass the old linter.

21:08 And like you have this, you kind of have this like lockstep deployment problem.

21:12 That's really hard.

21:13 I see.

21:13 Yeah.

21:14 Right.

21:14 And then if you have a global, you have two projects and they're out of sync, then it's

21:18 even worse.

21:18 Oh, and yeah.

21:19 The multi-project problem gets even harder.

21:21 Like microservices become almost impossible.

21:23 Right.

21:24 But the cool thing with pre-commit is the tools are all versioned directly in the configuration

21:29 file.

21:30 And so upgrading a linter can be done in one atomic commit to move forward or backward.

21:34 And so everyone is always at like a consistent state there.

21:38 Yeah.

21:39 That's really a good philosophy or good design principle there.

21:42 So you spoke a little bit about the steps to get going, but maybe just walk us really quickly

21:47 through a quick start just to what's it like if I've got a machine with no get commit hooks,

21:52 no pre-commit installed, how do I make my project start doing some of these cool validations?

21:58 Yeah, sounds good.

21:59 So there is a quick start guide on pre-commit.com.

22:02 So all of this stuff that I'm about to say here is pretty easy to, you know, follow the

22:06 instructions there and get the same thing.

22:07 But I'm just going to run through it quickly here as well.

22:10 So the first step is acquiring the pre-commit tool.

22:14 And you can do this in a number of different ways.

22:16 Some of the operating system package managers have packaged it.

22:20 So sometimes you can, you know, apt install or DNF install pre-commit and get it that way.

22:26 I usually don't suggest that because the operating system package managers are usually like six

22:31 to eight months behind and you won't get the newest features and stuff.

22:34 But, you know, if you're on macOS, like brew install pre-commit is usually up to date.

22:38 And if you're a Python project, you can just use pip install.

22:41 I mean, even if you're not a Python project, you can also use pip install.

22:44 There's also a condo package and a few other ways you can acquire pre-commit.

22:49 But anyway, the first step is to somehow install the pre-commit tool.

22:52 Right.

22:53 Pick your favorite pick.

22:54 Pick your favorite.

22:55 Go with it.

22:55 Yeah.

22:56 Yep, for sure.

22:57 The next step is to set up some amount of pre-commit configuration file.

23:01 You can usually find one in a project that you use and know uses pre-commit.

23:06 So I often copy and paste from another project that I've done or a project that someone else

23:11 has done that involves pre-commit.

23:12 But pre-commit also comes with a pre-commit sample config command.

23:16 So you can get kind of a very, very basic configuration that you can expand out further.

23:20 I think the default configuration comes with a YAML checker and some whitespace fixers.

23:25 And you can always add your favorite code formatter or linter to that.

23:29 There's a list of supported hooks on pre-commit.com slash hooks.html.

23:35 And there's hundreds of different tools at this point that can be installed and run directly

23:40 without any setup.

23:41 But yeah, the second step is setting up a configuration file.

23:44 The next step after that is to opt into the git hooks.

23:48 So the git hooks are actually, you know, it was originally the point of pre-commit to be

23:53 just like a git hooks manager.

23:54 But at this point, I think pre-commit works almost better as just like a linter code formatter

23:59 runner.

23:59 If I would, honestly, if I went back in time, I would probably be like, Anthony, don't call

24:04 it pre-commit.

24:04 And Anthony, maybe don't use YAML.

24:07 But yeah, because pre-commit now supports like a bunch of different git hooks, like pre-push,

24:13 commit message, prepare commit message, post checkout, a bunch of other ones.

24:17 Right.

24:17 But also like, it's useful even if you're not using it in Git workflow.

24:21 But yeah, anyway, the next step is to install the git hook scripts.

24:24 And you can do that by doing pre-commit install.

24:26 You need to install it for a particular hook type.

24:29 It defaults to pre-commit, but you can install like pre-commit install --hook type.

24:33 I don't know, like post checkout or whatever.

24:36 Right, right.

24:37 Prepare commit message.

24:38 And so this looks in the YAML file where you specify the hooks that you want.

24:42 And so when you install it, it goes, okay, these are the three I need.

24:44 So it defaults.

24:45 So the YAML file can be used for a bunch of different hooks.

24:48 So the hooks in the YAML file specify a stages property, and that will decide which of the

24:54 git hooks they fire for.

24:55 You might, you know, have like a trailing whitespace hook, and it might run on commit, push, and post

25:03 checkout, or I don't know.

25:04 And so like, the specific git hook that you opt into is only part of the install setting.

25:10 And so like, if you wanted to use pre-push with pre-commit, you would do pre-commit install

25:14 hook type pre-push.

25:16 But yeah, once you've set up the git hooks, it's usually a good idea to run pre-commit against

25:21 your entire project, especially when you're just starting out with a new code formatter or

25:24 something.

25:25 And pre-commit makes this really easy with pre-commit run --all files.

25:29 This will kind of take all of the tools, install all of them, and then run them against every

25:34 file in your repository.

25:35 Right.

25:35 So this is as if you had done, you basically added everything, and then somehow triggered

25:41 a git push to evaluate those files, but you can just make it happen to sort of set the baseline.

25:46 Yeah, it's for setting a baseline.

25:47 Yeah, because your first commit isn't going to touch every file in your code base.

25:50 And pre-commit tries to be very smart about the files that it passes to linters to make

25:57 sure that your git hooks are as fast as possible.

25:59 Because you could lynch your entire code base every commit, but no one has time for that.

26:03 Yeah, exactly.

26:05 And yeah, once you've kind of got set up there, you can kind of iterate on your pre-commit

26:09 configuration, like add more tools that you might want to see or might want to use things

26:13 like FlakeGate or Black or your favorite import reorder or your SCSS formatter or whatever

26:20 tools that you might want to do.

26:22 You can also add your own custom tools as well.

26:25 If you have existing scripts that you want to migrate to pre-commit, pre-commit makes it

26:29 pretty easy to run stuff directly in your repository as local hooks or things that happen to

26:34 be globally installed as you migrate to more of a managed hook situation.

26:39 And kind of the last step that I usually suggest is if you're going to add this pre-commit validation.

26:44 So pre-commit validation can always be skipped.

26:46 There's --no verify in git commit, which allows you to just skip all the git hooks.

26:51 So you could just turn it off if you wanted.

26:52 Pre-commit also provides a skip environment variable that lets you skip individual hooks.

26:57 So if you have like some buggy hook that you don't want to check or actually we had this one hook at Yelp that was hilariously slow.

27:06 And so I just never wanted to run that hook.

27:09 So I would just always skip it and make sure that I was adhering to the policy manually.

27:15 Right.

27:16 But you don't have to skip hooks.

27:19 But anyway, they're always just client validation and client validation can always be bypassed.

27:24 So it's usually a good idea to set up some amount of this linting validation in your continuous integration.

27:29 Right.

27:30 And fortunately...

27:31 Hold on.

27:32 First, it's probably a good idea to have continuous integration and then to set this up for it, right?

27:38 Yeah, that's true.

27:39 You have tests, right?

27:40 Right.

27:42 But yeah, it's always a good idea to have some amount of continuous integration.

27:45 AKA users?

27:47 No, I'm just kidding.

27:50 Just kidding.

27:51 I'm just going to hide because my chat in my stream the other day was hounding me for not having tests on my most recent project.

27:57 And I was like, oh, yeah.

27:59 I did write some yesterday, though.

28:01 So I have non-zero tests now.

28:02 That's good.

28:03 Sorry, I didn't mean to derail you.

28:05 You're saying you should also set this up at CI.

28:07 Oh, yeah, yeah.

28:08 And pre-commit makes that really easy to get set up in CI.

28:11 You can essentially run the same command we ran earlier, which is pre-commit run --all files, and set that up as something that validates all of your code changes.

28:19 How does it get set up in CI?

28:21 Like, do you have to do the pre-commit dash install on every run?

28:25 Does that get cached and only do the delta?

28:27 Or what's the CI look like there?

28:29 So, unfortunately, it's kind of a little bit fiddly to set up right now in CI, which is actually part of the project that I'm building right now.

28:38 I'm actually working on building a kind of generic continuous integration solution that's aimed only at pre-commit.

28:46 Oh, nice.

28:46 So if you want to check out more information on that, it's pre-commit.ci.

28:49 It's kind of a work in progress right now.

28:51 But you can set up pre-commit with basically any CI provider that you have right now.

28:55 But you have to manage the cache yourself.

28:58 You have to figure out what command you want to run.

29:00 You have to figure out, like, what change set boundaries you want to run against.

29:03 So, like, maybe on a pull request, you only want to validate the files that changed between master and your branch.

29:09 And, yeah, it's all pretty manual setup right now.

29:12 And for Travis CI, it's five to ten lines of YAML.

29:16 For Azure Pipelines, it's, like, 30 lines of YAML.

29:19 And, like, if you don't get it quite right or if the cache is invalidated for weird reasons, like, your run will be significantly slower.

29:26 And that's actually one of the big selling points that I'm trying to push for on pre-commit CI, which is that everything will basically always be cached because it can share all these tools amongst other developers, which is pretty cool.

29:38 Right, right.

29:39 Yeah, that's really cool.

29:40 Yeah, but you've got a lot of examples, and I'll try to link to them in the show notes, of how to set it up for Travis, Azure Pipelines, all the various things, right?

29:49 So that's pretty straightforward.

29:50 Yeah, I think most popular ones are covered there.

29:53 And if there's others that are missing, like, feel free to send a pull request.

29:56 Yeah.

29:57 Another cool thing that I'm trying to build into my CI tool is that, like, and this happens probably, like, ten or so times a month, where someone will come to one of my projects and submit a pull request.

30:09 And they haven't set up the client tooling locally, so they aren't running the pre-commit hooks.

30:13 They haven't run the code formatters or any of these things.

30:17 And so they'll get kind of that middle to worst case of the CI system is telling them, oh, your code isn't formatted this way.

30:24 Like, here's a patch that would fix it.

30:26 But one of the features that I'm planning to build is that pre-commit CI will just automatically fix pull requests for you.

30:33 So you won't have to, even if you forgot to set up the local tooling, the remote tooling will just auto fix it for you.

30:39 Oh, that'd be nice.

30:40 Instead of just saying, oh, it's wrong, you have to fix it.

30:44 It just, it always comes in right.

30:45 Yeah, it just fixes it for you.

30:46 There's actually a tool.

30:48 I guess it doesn't always come in right, because it could be, like, actually broken Python, and you could be validating.

30:53 Like, there's things that can't be fixed.

30:54 But if it could be fixed, it should be auto fix, right?

30:57 Yeah, it tries to auto fix things that are auto fixable.

30:59 There's actually a tool, I think Mariota made it.

31:02 I believe it's called Blackout, which is a Black-specific tool that kind of has this same idea of, like, just fix the pull request for me.

31:11 Sorry, I didn't set up Black locally.

31:13 Yeah.

31:13 But Pre-Commit CI kind of aims to be a generic tool for these sort of auto fixing things.

31:18 Yeah, that's really cool.

31:19 So you talked about, if I didn't set this up locally, the way this works is all of these get, in general, get pre-commit hooks, they don't stick to the repository.

31:29 They're not, like, now part of the repo because I set them up.

31:33 Like, every person on the team has to go through this three or four step process.

31:37 Yeah, so they won't have to set up the configuration because usually you check that in so that it's shared amongst all your peers.

31:42 But yeah, setting up the tool would be something that everyone else has to do.

31:45 Okay.

31:46 There are ways.

31:48 So there are ways to kind of automate this among peers.

31:51 So, like, one thing that we did at Yelp was we made pre-commit one of the Python dependencies of the application.

31:57 So the installation part was skipped.

32:00 And we also made the get hook set up part of the common make targets.

32:05 So, like, you used to run, like, make minimal to get Yelp application working or whatever.

32:10 And make minimal would install the hooks so that you didn't have to think about that sort of thing.

32:14 Yeah, that's cool.

32:15 In other, in managed environments, there are other things you can do as well.

32:18 Like, you can set a get init template.

32:21 And the init template could already contain the get.

32:24 Pre-commit actually has a command that makes it easy to set up an init template.

32:27 But yeah, if you're working just, like, on vanilla laptops or, like, on open source projects,

32:33 like, you would need to consciously opt into this behavior.

32:36 And actually, that's one of the bigger design principles of Pre-commit is, like,

32:39 none of this stuff happens super automatically.

32:42 Like, you're always opting into the behavior.

32:44 Right.

32:45 It's all optional.

32:46 Okay.

32:46 Yeah, I think that's positive, right?

32:48 That's a good thing that you're not forcing it on people.

32:50 But with the CI, you're kind of catching the mist.

32:53 Yeah.

32:54 The other part is, like, if stuff runs too automatically, people get concerned about security.

32:58 And so it tries to not be, you know, an arbitrary code execution engine, even though it is with opt-in.

33:04 Yeah.

33:05 Well, people do get a little touchy when you automatically run tools against their source code.

33:09 Yes, I've received quite a few issues that are like, why does this thing exist?

33:16 You should delete it now.

33:17 And it's like, oh, well, you don't have to use it.

33:20 Exactly.

33:21 Talk Python To Me is partially supported by our training courses.

33:27 How does your team keep their Python skills sharp?

33:30 How do you make sure new hires get started fast and learn the Pythonic way?

33:34 If the answer is a series of boring videos that don't inspire, or a subscription service you pay way too much for and use way too little, listen up.

33:43 At Talk Python Training, we have enterprise tiers for all of our courses.

33:47 Get just the one course you need for your team with full reporting and monitoring.

33:51 Or ditch that unused subscription for our course bundles, which include all the courses, and you pay about the same price as a subscription once.

33:59 For details, visit training. talkpython.fm/business, or just email sales at talkpython.fm.

34:06 Let's talk about some of the tools.

34:10 You know, we've set the groundwork, right?

34:12 It's like this framework for bringing in all these different types of pre-commit hooks.

34:16 And when you say there's a lot that you can bring in, there are a ton.

34:19 So maybe we could talk through some of the ones that come built in that you think are cool.

34:23 I'll grab a couple as well.

34:24 And then there's a bunch of others that are sort of external, but can be loaded.

34:28 Yeah, sounds great.

34:29 So pre-commit itself is a framework, but there's also an official set of pre-commit hooks that I and other contributors have written.

34:36 And those are available at pre-commit slash pre-commit hooks on GitHub.

34:39 And there's, what, like 29 of them that are provided kind of out of the box there.

34:47 The original intent of the pre-commit hooks repository was to be Python specific.

34:53 However, it's more shifted to like take the Python parts and split them out

34:57 and kind of aim to be language agnostic.

34:59 So there's a lot of checks in there for like your common configuration formats.

35:03 So like there's checkers for like JSON, Toml, YAML, et cetera.

35:06 There's actually some really cool ones for Python specific stuff.

35:10 Like there's one that checks that you don't check in breakpoints or debugger statements.

35:15 So like you accidentally put import PDB.

35:17 It'll flag that and make sure that that doesn't end up in production.

35:21 Right, right.

35:22 You definitely do not want to leave a import PDB set breakpoint sort of thing because I don't know why the server is locked up,

35:30 but it just seems like it's, there's not, it's not using the CPU, but it's just stuff's timing out.

35:34 What's going on?

35:34 Yeah.

35:35 The number of times I've seen PDB quit in production, I'm like, oh, damn it.

35:40 This is entirely my fault.

35:42 But yeah, this kind of helps prevent that.

35:46 Or like, you know, I left a breakpoint and my test suite exploded because there was a breakpoint in it or stuff like that.

35:51 Right.

35:51 So some of the stand out to me, like one was check AST, which checks whether or not Python files can be parsed.

35:58 That's pretty cool because Python doesn't have a compiler, not in the command line version sense.

36:04 I know it sort of, it does generate the PYC, but kind, kind does.

36:08 There's not a build step is what I'm saying.

36:10 Right.

36:10 For sure.

36:11 Right.

36:11 Which goes through and verifies all the files.

36:13 So this is a little bit like a build step for like really basic stuff.

36:16 Yeah.

36:17 Check AST kind of aims to be like a first line of defense.

36:21 In a lot of ways, like other linters like Flick 8 could be a better choice than check AST.

36:26 But yeah, check AST is kind of like the most basic, is this valid syntax check?

36:30 We actually use this as part of moving to Python 3 at Lyft.

36:35 We kind of had this like three to four stage process.

36:38 And like the first stage was make sure all the syntax is valid Python 3 syntax.

36:42 And like make sure that linters pass in Python 3.

36:45 And then like make sure that Python 3 specific linters pass in Python 3.

36:49 And then run the test in Python 3.

36:51 And then go to staging and then production.

36:52 And then delete all the Python 2 code.

36:54 And that was kind of our, check AST was kind of our first step there in that process.

36:58 I see.

36:59 Yeah, that's really cool.

37:00 Another is about checking for AWS secrets.

37:04 Ah, yes.

37:06 How many times have you accidentally your entire AWS account to the public on GitHub?

37:12 Actually, GitHub has some special checking for this now as well to make sure that your AWS

37:17 secrets or whatever are not leaked on commits.

37:20 And Amazon actually has invested a lot in scanning public source code repositories to

37:25 invalidate these tokens as quick as possible.

37:28 Oh, yeah, that's cool.

37:30 Because it costs, you know, it costs individuals a lot of money, but it also, Amazon spends CPU

37:35 and other resources finding Bitcoin or whatever, whatever people are doing nefariously with

37:40 these leaked tokens.

37:41 But yeah, this is kind of like, again, a first line of defense for not checking in.

37:45 Yeah, you don't want to do it.

37:46 It's not like eventually someone's going to find it.

37:48 It's bad.

37:49 So there's systems like...

37:50 It's like seconds.

37:52 Yeah, it's like shh git is a project.

37:57 There's a bunch of these.

37:58 But this one is, should git find secrets and sensitive files across GitHub, including gist,

38:03 git lab and bit bucket in near real time.

38:06 And it does so by hooking up to like the public stream of activity.

38:11 And like, as soon as there's a check in, it's just after it, right?

38:14 And so there's three or four of these types of tools.

38:16 And basically, it would be really nice if something like this detect AWS credentials,

38:21 pre-commit hook, didn't let that get into the mix.

38:24 For sure.

38:25 Yeah.

38:25 And there's actually a lot of other tools around detecting credentials in the pre-commit space.

38:30 So there's direct integration for, I believe the tool is called Bandit, which is a Python

38:36 specific...

38:37 Yeah.

38:38 Bandit's cool.

38:39 ...tool.

38:39 And I believe they have direct pre-commit integration.

38:41 So you can just set up Bandit as a pre-commit hook.

38:43 I don't know.

38:44 There's a Go project that's fairly popular that does the same thing.

38:47 And pre-commit has direct support for Go in that particular tool.

38:51 But there's a bunch of tools in this space that make it really easy to prevent check-in

38:55 of sensitive credentials.

38:56 Another one that stood out to me was no commit to branch.

39:00 Yeah.

39:01 This one actually was a community-contributed hook.

39:04 The idea being that you don't accidentally commit directly to your master branch or to

39:10 your particular development branch.

39:12 And this helps you enforce your...

39:14 Production or whatever.

39:15 Yeah.

39:15 It helps you enforce your specific Git workflow that you want to do.

39:18 So you're specifically targeting your own feature branch, or you could use this to

39:23 enforce a branch naming scheme or a bunch of other stuff like that.

39:27 But yeah, that one was actually contributed externally.

39:29 Yeah, exactly.

39:30 Basically, you require people to use a proper PR workflow, like a Git flow type of thing,

39:35 rather than just jam it straight in.

39:37 Yeet it straight to master.

39:38 Heck, just edit it on production.

39:40 That's even quicker.

39:41 I deploy via SSH.

39:45 Exactly.

39:46 Yeah.

39:47 Another cool one that we used at Yelp was forbid new submodules.

39:51 We went through this.

39:52 This is actually kind of a long story, so I'll keep it a little bit short.

39:56 But Yelp went through a migration of installing all Python packages globally on the system to

40:02 virtual environments.

40:03 And there are some grumpy sysadmins that were super against virtual environments.

40:08 And so we kind of cheesed our way to getting virtual environments by using submodules instead.

40:13 And Git submodules, not Python submodules, Git submodules.

40:17 And so there was one point where Yelp main had 98 Git submodules.

40:22 And we were doing this fancy pip install workflow that allowed us to locally install all of these tools so that we didn't have to manage them at the system.

40:30 But eventually, the grumpy people left the company and we moved everything to virtual environments.

40:36 And part of moving to virtual environments was to burn down the tech debt of submodules.

40:41 And so we added this forbid new submodules hook, which prevents newly introduced submodules while allowing existing ones such that we could kind of do some graph driven development and bring that down to zero while we migrated towards the new world.

40:55 That seems like a good path there.

40:57 Very nice.

40:58 One that I was wondering about is check executables have shebangs, the little hash exclamation point, something at the top, which is fine.

41:07 But how do you know it's an executable?

41:09 Is it because it has a dunder main equals dunder name thing in it?

41:12 Or what counts as an executable?

41:13 So this is more about file permissions and accidentally checking in files that are not executable.

41:20 So one example where this commonly gets triggered is when you copy files from a thumb drive, they'll commonly be on a FAT32 file system, which doesn't have full support for all of the permission bits.

41:33 And so almost always the permission bits will be 777 and you'll end up checking in all sorts of non-executable and non-script files with executable bit permissions.

41:43 And depending on your particular system that you're using, this can often trigger lint errors.

41:47 And so this is kind of a first defense against that.

41:50 One particular system is like Debian is real grumpy about, oh, this PNG file has the executable bit set.

41:56 And so I'm going to reject this package or whatever.

41:59 Right, right.

42:00 But this also helps you if you've forgotten to put a shebang.

42:03 So like sometimes you'll have like an entry point to your application that you intend to be executable, but you've forgotten to put user bin env Python 3 or whatever.

42:11 Right.

42:12 So this kind of checks for that.

42:13 Okay, cool.

42:13 So those are a bunch of the built-in ones.

42:16 Any of the other external ones you want to give a shout out to?

42:18 Yeah.

42:19 So if you're setting up a Python project, you probably want an import sorter.

42:23 And there are two, the two most popular ones.

42:27 One of them I wrote, which is called Reorder Python Imports.

42:30 It's very basic in what its name is.

42:32 But the idea behind Reorder Python Imports is you set up essentially no configuration and it just does the right thing all the time.

42:39 And if you kind of want the opposite of that spectrum, there's a tool called iSort, which has direct support in pre-commit.

42:44 iSort has 50 or 60 configuration options, so you can customize your import sorting to whatever thing you want.

42:51 Some others that I would suggest is getting a linter like Flake 8 set up.

42:56 And Flake 8 has direct support for pre-commit as well.

42:59 I'm also the maintainer of Flake 8, so of course it has pre-commit support.

43:03 Of course. Awesome.

43:04 And perhaps the most popular and probably one of the reasons that pre-commit has taken off a lot recently is Black, which is, we've talked a couple of times on this already about Black.

43:14 But Black is a code formatter that there's one way to do it right, and Black does it in a very specific way.

43:20 And Black has direct integration with pre-commit.

43:23 You can also set up things like type checkers, like there's mypy integration with pre-commit.

43:28 There's other sorts of stuff like that.

43:30 And one more that I'll shout out, which is one that I've written, a tool called PyUpgrade, which allows you to kind of upgrade your syntax to newer versions of the language.

43:41 So things like automatically making f-strings, or like removing Python 2 syntax constructs, or unsixing your code, so to speak.

43:49 And that's another one that has decent pre-commit integration as well.

43:53 I see. It looks for old idioms and styles and says, stop doing that.

43:58 Yeah, well, not only does it say stop doing that, it just auto-fixes it for you.

44:01 Oh, that's even better.

44:02 But yeah, I really like my code formatters.

44:04 Like, if a linter is grumbling at me, I'd rather have formatter just auto-fix it for me.

44:09 Yeah, absolutely.

44:10 All right, so that's the big list of things that you could use.

44:13 But what if you have an idea for a new one?

44:15 Part of the whole idea of pre-commit is it's a framework for building and distributing these things, right?

44:21 Yep, and it's really easy to make your own set of hooks.

44:24 That's actually another problem with the other sort of frameworks, is that if you want to add tools to it, you kind of have to fork their framework and, like, inject your code directly into their framework.

44:33 But pre-commit takes kind of a distributed model to this.

44:38 Basically, any git repository that has a little bit of metadata in it can provide a git hook, as long as it's installable in some way.

44:45 And the process is outlined on the pre-commit documentation, but there's a bunch of different programming languages that this works with.

44:52 And you basically set up a pre-commit hooks.yaml to provide hooks to other repositories.

44:58 And there's a, you know, you can always follow an example on any of the other repositories that already provide stuff.

45:03 And it's also pretty easy to take an existing tool and add the small metadata file to it that can be used directly with pre-commit.

45:11 Of course, there's also escape hatches.

45:13 Like, if you don't want to use the repository-managed approach, you can set up local hooks that go directly to PyPI or, like, use a Docker image or other stuff like that instead of going through the managed approach.

45:23 Although there's some disadvantages to that around, like, automatic updates and workflow management and that sort of deal.

45:30 Right. Yeah, that looks really cool.

45:32 So it seems pretty easy to make them.

45:34 And as long as it's installable via IPI or gem or NPM or is it executable, it's more or less ready to go along with that metadata.

45:42 Yep. Essentially, like, if you can be git cloned and something can run, like, pip install or the equivalent, you're golden.

45:48 You're ready to go.

45:48 Yeah. And then under supported languages, where maybe languages is in, like, quotes or something, you've got Conda, Docker, Docker images, fail.

45:57 What is fail?

45:58 So there's a special hook.

45:59 This was actually added specifically for pytest, which I'm a maintainer of.

46:03 Fail takes a file name regular expression and will always return one if anything matches that.

46:10 So one example is, like, we wanted to enforce that our changelogs were a specific file name pattern because it's very common for somebody to write a changelog but forget the RST extension.

46:21 And when they forgot the RST extension, then would go to run a release, we would forget their changelog fragment.

46:27 And so their change wouldn't be called out.

46:29 And so this was a way to enforce with a particular message that a file name matches specifically.

46:35 Okay. Yeah, cool.

46:37 And then you got Golang, Node, Perl, Python, Ruby, Rust, Swift, Script, I'm guessing Bash and whatnot.

46:44 So that's a lot of options.

46:45 And with Docker, like, that's a pretty wide open.

46:48 Yeah, with Docker, you can basically do anything.

46:50 Yeah, exactly.

46:51 Exactly.

46:52 Yeah, and it's actually pretty easy to add a new language.

46:54 So if there's another language that you would like to see pre-commit support, there's actually a little guide in the contributing documentation for pre-commit on how to set up another one.

47:02 Someone's actually working on support for Crystal right now, which is a language that I had not heard of until relatively recently, but it's basically like compiled Ruby.

47:12 It's pretty cool.

47:13 Okay.

47:14 Yeah, there may be support for that.

47:15 Yeah, nice.

47:16 Something that I also saw in the docs was talking about automatically enabling pre-commit on repositories.

47:23 Can you, like, register it with Git so when you get a new repo or clone something, it'll just be part of it?

47:28 Yeah, so there's this template directory concept in Git.

47:32 Basically, you can run pre-commit init templater and then set a specific configuration value in your Git config.

47:38 And Git will splat out files in a particular way in your clone such that you'll automatically be enabled.

47:46 Okay.

47:46 Yeah, that's nice.

47:47 This usually involves just, like, writing the Git hooks pre-commit file magically.

47:51 Right, yeah.

47:52 So if you really like this whole idea, then just make it automatic, right?

47:57 Yep.

47:57 We use this at Lyft so that whenever you would clone a Lyft repository, you would automatically have everything set up, which worked pretty well.

48:04 Yeah.

48:05 And one of the things that all the cool repositories are doing these days is they have those little badges, like, on their readme.

48:10 Mm-hmm.

48:11 Right?

48:11 So it's Python 3, or it has this many downloads on PyPI.org, installs via pip, and so on, or whatever.

48:18 And so you guys have a nice way to badge your project as well to say that it's powered by pre-commit to sort of spread the word.

48:26 Yep.

48:26 Actually, this one's a really cool thing that I want to shout out.

48:29 This was actually contributed by an external contributor, which was like, hey, I want a way to, like, show other people, like, what tools I'm using.

48:36 And, like, pre-commit is a really cool tool, and I want to see it succeed.

48:39 And so they made SVG badge and, like, made it really easy to set up a little bit of markdown or RST or whatever text document format you want to use to badge your repository and spread the word.

48:52 Yeah, awesome.

48:53 That's really nice.

48:54 And then last question on this, I guess, is are you looking for contributors to the project?

48:59 There's a lot of people who ask me, hey, I want to get started in open source.

49:03 Could you recommend something I work on?

49:05 Would this project count as one of those?

49:06 Yeah, we're always looking for new contributors.

49:08 The thing with pre-commit is, I don't want to say it's feature complete, but it hasn't really gained any large features in quite a long time.

49:14 Most of the places to expand pre-commit is to add more tooling support or to add more programming language support.

49:21 And I'm always looking for expansions in those spaces.

49:24 Or, you know, new ideas that I just haven't really thought of yet.

49:28 Yeah, maybe there's some cool tool out there that's not integrated with you guys.

49:33 Maybe they could commit to pre-commit by actually committing to that other project to get the right metadata.

49:38 For sure.

49:39 Yeah.

49:39 There are a few like help wanted getting started smaller issues that are in the issue tracker.

49:45 But I usually try and keep the backlog pretty well trimmed.

49:48 Yeah.

49:49 Nice.

49:49 All right.

49:50 Well, what a cool project.

49:52 And it's definitely something I'm going to be checking out as well because it looks neat.

49:56 Thanks.

49:56 I'm really proud of it.

49:57 Yeah, you should be.

49:58 It's great.

49:59 All right.

49:59 Before we get out of here, though, got to ask you the two final questions.

50:03 You're going to write some Python code.

50:05 And I know you write a lot.

50:07 What editor do you use?

50:08 Oh, no, this is always a fun one.

50:10 So if you would have asked me this six months ago, I would have been a little bit embarrassed

50:16 about my answer, which maybe I'm still embarrassed about my final answer.

50:19 But six months ago, I would have said nano as my text editor of choice.

50:23 I've actually I'm actually really good at using nano.

50:27 Before that, I was a IntelliJ user and I used that for the longest time.

50:31 But I found that like context switching between the terminal and not was was a lot of work.

50:35 But anyway, my real answer to this question, I have actually written my own text editor,

50:39 which in retrospect, I would not suggest anyone to do this.

50:43 It's been quite a roller coaster.

50:45 But my texter is called Babby.

50:47 The name is a little bit silly.

50:49 It's actually nano.

50:50 But if on a QWERTY keyboard, if you shift your hand over one and typo nano, you end up with

50:55 Babby.

50:55 And this was like a common typo that I would I would make pretty frequently.

50:59 But yeah, it's open source.

51:01 It's written in Python.

51:02 Some of the ideas behind it were like, make it really easy to hack on and like add features

51:06 to and like I wrote a lot of it on my Twitch stream.

51:09 And interestingly, a bunch of the features in the text editor were actually contributed by

51:15 just like people in chat, which were like, hey, this would be cool if Babby did this and

51:19 like run off for 15 or 20 minutes and come back and have a working patch with the

51:24 new command or new cool way to do something.

51:28 That's awesome.

51:29 But yeah, it's pretty easy to hack on.

51:30 And that's part of the reason that I kind of like it.

51:33 Is it pip install Babby?

51:34 Yep.

51:35 pip install Babby.

51:36 And that should work on most platforms.

51:38 It works on Windows, oddly enough, which was actually also another thing that chat contributed.

51:44 Like I developed it entirely on Linux and it's based on curses.

51:47 Windows doesn't quite have native curses support, but you can install it like Windows curses package

51:53 to make that work.

51:54 Okay.

51:54 Yeah.

51:54 It should just be pip install Babby and start hacking on it.

51:58 It's not feature complete in any way.

51:59 So like there's some stuff that's, you know, like horribly broken.

52:02 Like if you try and rename a file in Babby, it's just like, no, I don't I don't know how

52:06 to do that.

52:07 Sorry.

52:08 But I use it as my daily driver now.

52:10 So it's I know the shortcomings and I'll eventually get around to implementing those.

52:14 But yeah, cool.

52:15 It's pretty neat.

52:16 Awesome.

52:16 All right.

52:16 That is the first time anyone has ever said Babby.

52:19 So very cool.

52:19 Now we're getting the word out about it.

52:21 I'd be surprised if anyone else uses it, to be honest.

52:25 All right, cool.

52:27 Well, very interesting.

52:28 And then notable PyPI package, something that maybe people haven't heard about.

52:31 Ooh, something that people haven't heard about.

52:33 Or just that you came across, you're like, it's, you know, it's not requests.

52:37 It's not Django.

52:37 It's like, oh, this thing I found is pretty awesome.

52:39 People should know about it.

52:40 It's not requests.

52:41 You mean, you mean HTTPX?

52:43 HTTPX is also cool.

52:45 But it's just, you know, it's like the most common answer, I would say, is requests.

52:47 That's true.

52:48 Yeah.

52:49 I actually, controversial opinion, I really don't like requests.

52:52 So can we do an unpackage for this one?

52:55 You can.

52:56 Well, how about it?

52:57 What do you use instead of requests?

52:58 So if I get away with it, like I don't need H2 support or I don't need like special streaming

53:04 APIs, I'll mostly just use standard library.

53:07 Yeah.

53:08 Although for like newer stuff where I need like streaming APIs or H2 or I need async support,

53:14 I'll reach for like HTTPX or AIO.

53:16 Yeah.

53:17 AIO client.

53:18 Yeah.

53:18 In there.

53:19 Yep.

53:19 That's nice.

53:20 I've actually like, I've not done a lot of asyncio yet.

53:23 I've been dabbling with it a little bit.

53:25 I'm writing a chatbot for Twitch and I've kind of started sprinkling around some of the AIO

53:31 libraries.

53:32 What do you think about async in Python?

53:34 I think it's pretty good.

53:36 There's a few like get it set up and get it moving things that kind of bug me.

53:41 It bothers me that I can't just fire and forget async stuff.

53:44 Yeah.

53:45 You have to await it eventually.

53:46 Yeah.

53:47 Yeah.

53:47 Or like, like if you just call the async function, you get a coroutine, but the coroutine isn't

53:52 started until you like turn it into a task or you give it to the loop to run.

53:56 Yeah.

53:57 So that bugs me a little bit.

53:59 The fact that like the threading module doesn't support async and await.

54:03 Yeah.

54:04 But the IO module does, you know, the multiple, like all of them should.

54:08 So I actually am a huge fan of unsync, which is a library that puts a unified API on top of

54:14 those and gives you the fire and forget.

54:16 And it has a background thread that manages the runtime loop for you.

54:19 So, oh yeah, I've played a little bit with unsync.

54:21 It seems really cool.

54:22 So I think the, what we have in Python is really close to right.

54:26 There's just like a few rough edges.

54:28 And I think it's not necessarily unsync is the right answer, but something like that is

54:31 really close to like the final polish.

54:34 It needs to take it from like a, a B to an A minus or something level like that.

54:38 Right.

54:38 Yeah.

54:39 And then maybe once we get sub interpreters dropping the guild, you can take it up a notch

54:43 as well.

54:43 Oh yeah.

54:44 Sub interpreters can be really cool.

54:45 Yeah.

54:46 My problem with async is I feel like I have to rewrite all my code.

54:49 Sometimes that barrier to entry.

54:50 Yeah.

54:51 The challenge can be like it propagates, right?

54:53 You want to do it way down and well, that becomes async.

54:56 And then the things that call it become async.

54:58 And the things that call it become async.

54:59 And yeah, that can be a challenge.

55:01 And your testing library has to be async.

55:03 And then like all this, all this other stuff.

55:05 Yep.

55:06 Yep.

55:06 I think it'll get there though.

55:08 I think it's just like Python has 20 years of history and async was introduced pretty late

55:13 into that history.

55:14 And there's going to be some growth pains as it gets better, so to speak.

55:19 Yeah.

55:19 But places where I think that really shines is where you don't have to think about it.

55:23 Like where the thing that you're doing as async is on the boundary.

55:26 So for example, FastAPI, right?

55:28 You can just make your view method async or not make it async, right?

55:32 It's up to you.

55:32 And then if you want to do that part of your app async, you can.

55:36 But it's not from the bottom up, but it's kind of from the outside in.

55:39 And you can pick on the entry points from the outside where you want to adopt it.

55:44 And then I think that that makes a lot of sense there.

55:46 Yeah.

55:46 Especially like because you have a framework calling you, so you don't necessarily have

55:49 to do all async or all not.

55:51 Exactly.

55:52 Exactly.

55:52 You get a plug into it or opt into it.

55:54 Yeah.

55:54 But as a library designer, it's more challenging, I think.

55:57 How do you expose an async version and a non-async version and so on?

56:01 Definitely.

56:01 Cool.

56:02 All right.

56:03 Well, that's not exactly one package, but that's kind of a conversation around packages.

56:06 So let's maybe just put an answer to HTTPX or AIO, HTTP client.

56:11 Those are two cool HTTP libraries that kind of fit in that world.

56:14 I like them.

56:15 Yeah.

56:15 Sounds good.

56:16 Cool.

56:16 Yeah.

56:17 I mean, usually my answers would be around like a code formatter or linter.

56:19 I'll actually shout out like two of my favorite linters that are code formatters that I tend

56:26 to use.

56:26 I do really love Black, but I actually don't use it myself.

56:29 I actually tend to use a combination of two formatters.

56:33 One of them is called Autopep8, which is very similar to Black and some of its ideas and

56:38 like fixing lint errors.

56:40 And the other one, which is very similar to Black, but has existed for like four or five

56:45 years, is a tool called AddTrailingComma, which literally does what it says on the tin.

56:49 It tries to enforce trailing commas in the same way that Black does.

56:54 Right.

56:55 So the final element in like a list format, it would have that so that that line doesn't

57:00 get a diff when you add the next element.

57:03 Yep.

57:03 It tries to make minimal diffs when adding or removing like parameters and like lists or

57:08 tuples or function signatures or class signatures or type annotations or the whole gamut of

57:14 places where you can have braces and commas.

57:17 Cool.

57:17 All right.

57:18 That's a bunch of good ones.

57:19 Okay.

57:19 Final call to action.

57:20 People are interested in pre-commit.

57:22 They want to check it out.

57:23 They want to start using it.

57:24 What do you tell them?

57:24 So I would tell them to go to pre-commit.com.

57:27 That's probably the place where you can get the most.

57:29 It's pre-commit.com, right?

57:31 Correct.

57:32 Yeah.

57:32 With a dash.

57:33 The one without a dash is like some weird real estate website that I've been trying to

57:37 bug the guy to like give up his domain for years, but I don't know that he actually reads

57:41 his email at all.

57:42 But yeah, I would check out pre-commit.com and that's where you can get all the information

57:46 about that.

57:47 And pretty soon, pre-commit.ci will have more information about the CI system that I'm building

57:53 around pre-commit.

57:54 Oh yeah.

57:54 That's really exciting.

57:55 That's going to be cool as well.

57:56 I'll put a link to that in the show notes also.

57:58 Cool.

57:58 All right.

57:59 Thank you, Anthony.

57:59 It's great to chat with you and see all these amazing things you're creating.

58:03 Yep.

58:03 Always glad to chat.

58:05 Yep.

58:05 But see you later.

58:06 Bye.

58:06 This has been another episode of Talk Python To Me.

58:10 Our guest on this episode was Anthony Sotili and it's been brought to you by Brilliant.org

58:15 and us over at Talk Python Training.

58:17 Brilliant.org encourages you to level up your analytical skills and knowledge.

58:22 Visit talkpython.fm/brilliant and get Brilliant Premium to learn something new every

58:27 day.

58:29 Want to level up your Python?

58:30 If you're just getting started, try my Python Jumpstart by Building 10 Apps course.

58:35 Or if you're looking for something more advanced, check out our new Async course that digs into

58:40 all the different types of async programming you can do in Python.

58:43 And of course, if you're interested in more than one of these, be sure to check out our

58:47 Everything Bundle.

58:48 It's like a subscription that never expires.

58:50 Be sure to subscribe to the show.

58:52 Open your favorite podcatcher and search for Python.

58:54 We should be right at the top.

58:55 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the

59:01 direct RSS feed at /rss on talkpython.fm.

59:05 This is your host, Michael Kennedy.

59:06 Thanks so much for listening.

59:08 I really appreciate it.

59:09 Now get out there and write some Python code.

59:11 Thank you.