Learn Python with Talk Python's over 260 hours of courses

#282: pre-commit framework Transcript

Recorded on Tuesday, Jul 21, 2020.

00:00 get hooked scripts are useful for identifying simple issues before committing your code. hooks run on every commit to automatically point out issues in your code such as trailing whitespace and debug statements. by pointing out these issues before you get to a code review, this allows the code reviewer to focus on the architecture of a change, while not wasting time with trivial style nitpicks. As we create more libraries and projects, we recognize that sharing pre commit hooks across projects is painful. That's why I'm happy to welcome Anthony Santilli to the show to discuss, pre commit a framework for managing and maintaining multi language pre commit hooks. This is talk Python to me, Episode 282, recorded July 1 2020.

00:57 Welcome to talk Python to me, a weekly podcast on Python, the language, the libraries, the ecosystem and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm at m Kennedy. Keep up with the show and listen to past episodes at talk python.fm and follow the show on Twitter via at talk Python. This episode is brought to you by brilliant.org and us pythons async. And parallel programming support is highly underrated. Have you shied away from the amazing new async and await keywords because you've heard it way too complicated, or that it's just not worth the effort. But the right workloads 100 times speed up is totally possible with minor changes to your code. But you do need to understand the internals. And that's why our course async techniques and examples and Python show you how to write async code successfully as well as how it works. Get started with async and await today with our course at talk Python FM slash async. Anthony, welcome to talk by me, Dan, glad to be here. It's great to have you here. You and I have been on together on Python bytes along with Brian, we had you over there once I believe but never talk Python. So welcome to the show. Really happy to have you around. And you have this cool project that we're going to talk about. And it's kind of like the old thing is new again. Right? It's been around for a little while. But it's some of the other stuff is seems to have boosted its popularity. And to me, it seems like a fantastic idea with pre commit hooks and a framework for running them and building them. So I'm excited to dig into it with you. Sounds great. Happy to talk about it. Yeah. Before we do, though, let's get to your story. How did you get into programming in Python? My first python programming was actually at Yelp, where I worked as a software engineer, I was hired as a front end developer, and mostly did CSS and JavaScript. But I quickly realized that in order to unblock my own job made a lot more sense for me to learn the back end and like, write my own API's instead of waiting for a back end engineer to implement the same API's. So I picked up a bunch of Python and made my job a lot easier that way. Nice. Would you use on the back end was that flask thingo? Something else. So unfortunately, Yelp was created before there were a lot of these larger frameworks. And so Yelp had kind of their own homegrown framework, I eventually shifted from like front end to full stack to back end to infrastructure. And part of the infrastructure work that we did there was to move the monolithic code base to be based on pyramid. So kind of like the pyramid pylons sort of approach. And it was actually really nice, because we could basically cut out parts of the proprietary Yelp main code and stick in bits of pyramids so that we could offload some of the logic to open source software. Oh, nice. Yeah, that's really cool. Most of my sites are built in pyramid as well. I like that framework. It gives you a ton of customization. It's a little bit daunting to start with, like small projects. I usually prefer to use like, flask or whatever. Yeah, but it's definitely lets you do everything. Yeah. Well, if I'm looking at new things that look really sweet, FastAPI is looking sweet. Yeah, I've heard some good things. I haven't had a chance to try it out yet, though. You know, what I really like about it is you can define models, classes in pedantic and then say your view method takes one of those models, if you do get receiving like a JSON post, and it'll do validated pre population of all the fields using the PI data validation, which I'm just like, oh, that's just saved me like two thirds of my whole app. Right? That's awesome. Yeah, that's pretty good. Yeah, I feel like I spend most of my time in validation, input validation. So if it's exactly or converting, I know it's a string, but then we're gonna cast it to an end if it can be cast to doing it, all that kind of stuff. Like, yeah, it's all built in. So anyway, if I go build some new API's that's not just part of the main app, I probably would do some FastAPI's. But that's pretty new. It's only been around a year and a half. Nice. Awesome. So you moved from the front end stuff over to the back end at Yelp. And it sounds like you got to do some really cool transformation, empowering things there. What do you do now? well, technically, I'm currently unemployed. But my current passion is developer tooling and kind of infrastructure in that space. I was recently at Lyft, where I lead the developer experience organism

05:00 And, you know, building tools and infrastructure that make developers productive. The idea being like, I can invest some and tooling and give that to all developers, and then they can get their jobs done faster and better and safer and etc. Yeah. Awesome. Are you looking for like consulting stuff? People are out there? Or are you got plans already? So right now, right now I'm trying to build my own project and see where it goes. But I've kind of timebox myself to like, six to nine months. And if it doesn't turn out, well, when it doesn't turn out, I'll be looking for new employment. After that. It might turn out it might turn out. You're a good programmer, I know you can do it. Awesome. There's always a chance. Yeah, for sure. Got to give it a shot. That's awesome. Well, let's talk about this project that you've been working on for a while, but like I said, has gotten a little bit of traction, a lot of traction lately, because of tools like black and other things that have made pre commit hooks, awesome and exciting all of a sudden. But before we talk about what you've been doing, let's just talk about the idea of pre commit hooks in general. Sure. What is this for a lot of people who are like, yeah, I kind of know what Git is, I kind of use that or maybe have even used like, zip as a zip and name of a date as a source of final 111. But yeah, the idea behind Git hooks, so and specifically the pre commit hook, because I think that's probably the one that most people get the most interaction with. But there are a bunch of commands in git, where you can register callbacks as scripts to either do like validation, or like seen some people use it to like send emails or like close tickets, all sorts of other stuff. But the main focus around get hooks to me is the pre commit hook, where you can do like linting, and code validation, code formatting, you can run tests or other stuff like that. I guess the pre push hook is another one that's also kind of big in that same space where you want to do validation of your changes, before you send them off to like a right so you could like maybe reject some kind of reject the git push if the formatting is wrong, or the header is missing or something like that, right? Yep, you can take a lot of those, like,

07:06 easy to validate things and do them in a kind of a quick, fast manner before you would do your larger test suite or something. Catch a syntax error before you spend a bunch of time spinning up your ci systems, right? Well, speaking of ci, to me, this seems like the next natural progression from having ci, do these tests, right? So there's different levels, the developer should probably be writing and running tests and making sure that the test passed, they should be like formatting their code before they check it in stuff like that. But when you work on a team, my experience has been there's a wide, wide range of how much people are willing to do that, how much they care about those kinds of things. And what that means is, maybe you have ci continuous integration that runs automatically during check in once check has done. And so then they might check in something and might say, Oh, the build is now broken. Because you didn't bother to run the test. But you broke the tests, but because you didn't run them, you didn't know it. So yeah, no tree falls in a forest. No one hears it, right.

08:09 And that sort of thing. And so then you end up in the situation, the people that care about the build working have to track down the person who broke it, who didn't actually care it like it's just these layers of like, annoying type of thing. Yeah. And if you can make the push that validation to the location where the person is, and all the people, right, so even if you care, like you might not want to break the build, you might rather just get a warning, or just automatically have it fixed and pre commit hooks seem like they that's the natural place for that. Yeah, kind of mentality that I always had is like, if I'm waiting till ci to get feedback on like nitpicks around commas, or whitespace, or syntax or whatever. That to me is way too late in the process, because I've already like, you know, I've pushed I've already gone off to the next thing, I'm already answering my email or looking at GitHub issues, or talking in slack or whatever. And like I've already context switch to a completely different situation. And like, I could have been, I assumed it was good, because I already committed and pushed, right. I never make mistakes. So yeah, exactly. Zero fault code. Good. Push is like the final action. Now. You're done. Right? Yeah. But like, it was incredibly frustrating to like, make a push, and then have some either build system telling me that something was wrong. Or in code review, someone was like, Oh, well, you could have reordered these imports. So they're alphabetical or something. It's just like, yeah, this is a big waste of time. Let's kind of push this as far towards the developer as possible, such that we can make that a better a better situation. The other thing I think is interesting around these ideas is there's been studies that have shown that people are more willing to take nitpicky advice from a computer than from a human, there's like, Okay, well, the computer requires that I have this kind of whitespace or this kind of indentation or like this type of Fort, like you said, order alphabetical ordering or whatever. And when it comes from a person in a code review, it's like well, that person is just a jerk.

10:00 All right, I wrote good code. Yeah, here I am taking this flack for this thing to having this happen, like automatically, I think takes away the need to review that kind of stuff. It takes away the need to complain and be that that grumpy person that does that. And you can even go farther with tools like black where it doesn't just complain, it just goes, I fixed it for you. Yeah, actually, the way that I usually talk about this is like the absolute the worst situation is that a human tells you that something is wrong. The next worst situation is that like a CI system tells you that something's wrong. Better than that is that an automated local tool tells you that something is wrong. And the like golden standard is that an automated tool just fixes it for me like I don't I don't have to worry about it all. It just makes it happen. Yeah, and it also results. You don't have this sort of like dueling, alternate format style, right? Like, if I like to work in PI charm, you'd like to work in VS Code. And our formatting rules vary ever so slightly, you know, like commas between parameters are there two spaces between the colon and a type annotation. You know, if we both keep reformatting document, they cycle back and forth, right. And this way, you can sort of just hit it with the same format or right before it goes every time. Of course, yeah. And that also helps with things like get blame or like noticing when patches are minimal or whatever. And like being able to triage a change way down the road, not be distracted by two different editors reformatting in a different way.

11:26 This portion of Talk Python To Me is brought to you by brilliant.org. Brilliant has digestible courses in topics from the basics of scientific thinking all the way up to high end science, like quantum computing. And while quantum computing may sound complicated, brilliant, makes complex learning uncomplicated, and fun. It's super easy to get started. And they've got so many science and math courses to choose from. I recently use brilliant to get into rocket science for an upcoming episode, it was a blast, the interactive courses are presented in a clean and accessible way. And you could go from knowing nothing about a topic to having a deep understanding. Put your spare time to good use and hugely improve your critical thinking skills. Go to talkpython.fm/ brilliant and sign up for free. The first 200 people that use that link, get 20% off the premium subscription, that's talkpython.fm/ brilliant, or just click the link in the show notes.

12:19 So that brings us to your project, Bri dash commit.com, which, to me, I thought this was kind of something new, because I hadn't heard of it before. And I feel like with all the excitement around black being one of these auto format, or that we just discussed some of the other tooling around get pre commit hooks that this was like this big new thing. But we were talking and he told me it's been around for a little while. And it's just gaining a lot of momentum now. Yeah, the original projects actually came out of came out of my wants to enforce my partners in group projects in college to make sure that their whitespace was nice, and that formatting was good. But it was originally just like a several hundred line Python scripts pre commit that py, way back in 2012. The first public version of pre commit, I believe was May of 2014, which means that it just turned six years old a little while ago. And for the most part, the frameworks idea has remained basically the same since the beginning. Most of what's happened since 2014 has been a lot of bug fixes a lot more platform support. So like making things work well on Windows and other exotic platforms like macOS, and

13:33 which is basically the Wild West, right, these uncommon. Exactly, exactly. But also like support for other programming languages. So even though pre commit is written in Python, it aims to target a bunch of different programming languages, it can both lint and run tools in I think it's like nine or 10 or 11 different programming languages now and continues to support more going forward, it's pretty easy to add support for another language. Yeah, very cool. So the idea is, this is a framework for creating pre commit hooks, and some tooling to install and initialize some of the various plugins. And one of the things that's like you just covered is pretty unique to it, I think, is that it happens to be implemented in Python, but it'll let you both install linters that require different runtimes like Ruby, or Java. I'm not sure about Java, I don't remember it doesn't quite have Java support yet, but it's in the works. But the cool thing about it is that it aims to make it so you don't have to set up anything locally. You just installed pre commit, you have a configuration file, and it manages installing and running all those tools for you. So you don't have to worry about like as a sysadmin. You don't have to worry about distributing Ruby to your machines or like installing some weird Node JS package on your system or like maintaining the version of that or updating and downgrading or whatever you need to do with that, like preconnect will just install those tools for you. Right and there's a lot of tools that we may care about, even if we're just

15:00 Python developers that are not in Python, for example, we might be creating a pyramid web app that has a CSS, like less sass type of programming language for CSS, there's a nice linter in Ruby, but we got to have Ruby in order to do that, right? Unfortunately, predicament makes setting up CSS lint in particular, if that's the one you're referring to, makes that pretty easy to set up and go, you just add a little configuration, and it will set up a Ruby environment, even if you don't have Ruby installed on your machine, and install SCSS lint and make that easy to go. That's pretty cool. The weird thing about CSS lint, though, is it's it's kind of stopped development as SCSS has moved from either the Ruby implementation to the C++ implementation or the Dart implementation. And there are actually linters for CSS and JavaScript as well, which is another popular programming language that pre commit supports. And a lot of people have been moving to those other ones as well. But again, like Greek man will just install that and make that easy to run for you. So how's that work? Talk me through the I don't have Ruby on my machine, I want to run some tool as a pre commit hook that needs the Ruby runtime. Where's the magic? Yeah, so I actually think that's probably the biggest selling point of pre commit is all of the language smarts where it knows how to install and run things. But let's talk about SCSS lint specifically. So what you would do is you would install pre commit on your machine, and there's directions in like the quickstart guide on pre commit comm, you would set up a configuration file, small yamo file, which points at Sam repositories, which provide hooks, in this case for SCSS lint and believe it's pre commit slash mirrors, CSS lint, there are some repositories where we have to mirror them because the upstreams are either dead or, or like, don't want to add a small bit of metadata to the repository. And once you have the configuration file, you can run pre commit, run, or pre commit install to make it part of your Git hooks. And pre commit run will, in the case of Ruby, it uses several different approaches to try to set up a Ruby environment. So the first thing it'll attempt is if you already have Ruby installed, it'll just kind of skip all of the environment setup. And it'll just reuse the Ruby that you have. But it'll install SCSS lint in an isolated fashion. So it tries to stay away from everything else that's running on your system. Basically, the idea is like pigment provides virtual environments, but for every other programming, right, right. That's cool. And when you don't have Ruby, it will either download it using RVM, or RBF. which I love. I love the Ruby tooling landscape, it's like impossible to talk about either of the tools, because they both sound the same. If you slur your words together, like I do sometimes. Yeah. But RVM is provides a bunch of pre built binaries for a bunch of platforms. So it makes it easy to just like, download and run Ruby, and RB envy. The other tool makes it really easy to build Ruby from source. And so Preakness uses both of those tools to provide Ruby environments. I see. So there's this background thing that just if you don't have the right runtime, it'll go and get it. Yep, it'll make it happen. Yeah, and there's a similar tool called node m, where JavaScript that pre command leans on as well to set up those environments. Note of is kind of a mesh between the ideas of like pi m and virtual m, it actually has some, like direct integration with virtuallab. But it basically allows you to go from nothing to a working JavaScript environment without having to install anything on your system. Nice. Yeah, that's super cool. I do think that that's the magic, right? You can use all these different pre commit hooks without a setup beyond just pip install or pip x install be commit, by the way, just pip x makes sense. Are you familiar with pip x? Yeah, I'm familiar with pip x. I think it works for pre command. I haven't tried it myself, mostly because I managed it with virtual environments, but it should work. There's nothing super special about it. Yeah. It seems like it would as well to me. Yeah. So you install it. And then that's all you got to worry about. You don't have to worry about having Ruby or node or how about even Python? You do need Python? That's the one dependency, right? Because it's implemented in Python. Right? Yeah. There's plans to build a was it called pyinstaller? Is that the one? I don't know there's a bunch of different tools. Yeah. pyinstaller, or pi to app or? Yeah, like one of the csps any of that. So you've been thinking about potentially, like, turn it into one of these rocks? immutables. Yeah, unfortunately, I know almost nothing about pebble. So I haven't had a chance to look into that. But it should be pretty straightforward to make that happen. Yeah, that'd be cool. Yeah, one other thing before we move on, like, I think that too. There's a bunch of these Git hooks frameworks and like, I think premed is pretty unique in this installs the tools for you sort of aspect. I think like a lot of these other frameworks make two main mistakes. One of them is like, Well, the first mistake that everyone seems to make, it's like you check in this 2000 blind bash script and that's how you manage your Git hooks and like, anytime, someone needs to

20:00 Change that they've got to dive into this terrible script that always breaks and makes people's workflows really frustrating, which is actually what we had at Yelp before Yelp moved to pre commit. But the other thing is managing tools, I find that a lot of shops will install these linters globally, and their Git hooks will assume that they're, you know, installed globally. And you instantly have like three different problems from that one is version drift, both fit all of your developers will need to like make sure that they're on a specific version. The other is you need environment setup. So you can't just like switch from your laptop to your personal machine, or, you know, maybe I'm working on a netbook in a coffee shop or whatever, you would need to make sure that you have your very specific global environment setup, right, that can always kind of drift and be painful. And the third thing is if tools are managed globally, upgrading is impossible, because I have an old version of the code checked out. And a new version of linter is now on the machine. And so my code doesn't pass the new linter. And the new code doesn't pass the old linter. And like you have this, you had to have this like lockstep deployment problem. That's really hard. I see. Yeah, right. And then if you have a global, if you have two projects, and they're out of sync, then it's even worse. Oh, and yeah, then the multi project problem gets even harder, like microservices become almost impossible, right. But the cool thing with pre commit is the tools are all version directly in the configuration file. And so upgrading a linter can be done in one atomic commit to move forward or backward. And so everyone is always at a consistent state there. Yeah, that's really a good philosophy or good design principle there. So you spoke a little bit about the steps to get going, but maybe just walk us really quickly through quickstart. Just to what's it like, if I've got a machine with no git commit hooks? No pre commit installed? How do I make my project? start doing some of these cool validation? Yeah, sounds good. So there is a quickstart guide on premed Comm. So all of this stuff that I'm about to say here is pretty easy to you know, follow the instructions there and get the same thing. But I'm just going to run through it quickly here as well. So the first step is acquiring the pre commits tool. And you can do this in a number of different ways. Some of the operating system package managers have packaged it. So sometimes you can, you know, apt install or DNF, install, pre commit and get it that way. I usually don't suggest that because the operating system package managers are usually like six to eight months behind if you won't get the newest features and stuff. But you know, if you're on macOS, like brew, install, pre commit is usually up to date. And if you're a Python project, just use pip install. I mean, even if you're not a Python project, you can also use pip install. There's also a conda package and a few other ways you can acquire pregnant. But anyway, the first step is to somehow install the pre commit tool, right? Pick your favorite pick, pick your favorite go with it. Yeah, yeah, for sure. The next step is to set up some amount of pre commit configuration file, you can usually find one in a project that you use, and no uses pre commit. So I often copy and paste from another project that I've done, or a project that someone else has done that involves pre commit. But frequent also comes with a pre commit sample config command. So you can get kind of a very basic configuration, you can expand out further, I think the default configuration comes with a YAML checker and some whitespace fixers, and you can always add your favorite code formatter or linter, to that there's a list of supported hooks on pre commit comm slash hooks dot html. And there's hundreds of different tools at this point that can be installed and run directly without any setup. But yeah, the second step is setting up a configuration file. The next step after that is to opt into the Git hook. So the Git hooks are actually, you know, it was originally the point of pre commit to be just like a git hooks manager. But at this point, I think pre commit works almost better is just like a linter code for matter runner, if I would. Honestly, if I went back in time, I would probably be like, Anthony, don't call it pre commit. And Anthony, maybe don't use the EML.

24:08 But yeah, cuz frequent now supports like a bunch of different Git hooks, like pre push, commit message, prepare, commit message post, check out a bunch, bunch of other ones, right? But also, like, it's useful, even if you're not using it in Git workflow. But yeah, anyway, the next step is to install the Git hook scripts. And you can do that by doing pre commit install, you can install it for a particular hook site, it defaults to pre commit that you can install a pre commit install a --hook type, I know like post checkout or whatever, right right, prepare commit message. And so this looks in the yamo file where you specify the hooks that you want, so when you install it, because Okay, these are the three I need, so it defaults. So the ML file can be used for a bunch of different hooks. So the hooks in the ML file, specify a stages property and that will decide which of the get hooks they fire for. You might, you know have like a trailing whitespace hook and it might run

25:00 Commit, push, and post checkout or I don't know. And so like, the specific Git hook that you opt into is only part of the install setting. And so like, if you wanted to use pre push with brick that you would do pre commit, install, hook type pre push. But yeah, once you've set up the Git hooks, it's usually a good idea to run pre commit against your entire project, especially when you're just starting out with a new code format or something. And pre commit makes this really easy with pre commit, run --all files, this will kind of take all of the tools, install all of them, and then run them against every file in your repository. Right. So this is as if you had done you basically added everything, and then somehow trigger to get pushed to evaluate those files. But you can just make it happen to sort of set the baseline. Yeah, it's for setting a baseline. Yeah, because your first commit isn't going to touch every file in your codebase. And pre commit tries to be very smart about the files that it passes to linters to make sure that your Git hooks are as fast as possible, because you could lint your entire code base every commit, but no one has time for that. Yeah, exactly. And yeah, once you've kind of got set up there, you can kind of iterate on your particular configuration, like add more tools that you might want to see or might want to use things like flake eight, or black or your favorite import reorder, or your SCSS format, or whatever tool set you might want to do. You can also add your own custom tools as well. If you have existing scripts, you want to migrate to pre commit, pre commit, it makes it pretty easy to run stuff directly in your repository has local hooks or things that happen to be globally installed As you migrate to more of a managed hooks situation. And kind of the last step that I usually suggest is if you're going to add this pre commit validation, so premium validation can always be skipped. There's dash dash, no verify in git commit, which allows you to just skip all the Git hooks. So you could just turn it off if you want it premium. It also provides a skip environment variable that lets you skip individual hooks. If you have like some buggy hooks that you don't, you don't want to check, we're

27:02 actually we had this one hook at Yelp that was hilariously slow. And so I just never wanted to run that hook. So I would just always skip it and make sure that I was adhering to the policy manually. But right. But

27:17 don't you don't have to skip hooks. But anyway, they're always just client validation. And client validation can always be bypassed. So it's usually a good idea to set up some amount of this linting validation in your continuous integration. Right. And fortunately, for hold on first, it's probably a good idea to have continuous integration and

27:36 stuff for it. Right? Yeah, that's true. You probably you have tests. Right. Right.

27:42 But yeah, it's always a good idea to have some amount of continuous integration, aka users.

27:50 I'm just kidding. Just kidding. I'm just gonna hide because my chat in my stream the other day was was hounding me for not having tests on my most recent project. And I was like, Oh, yeah, I did write some yesterday, though. So I have nonzero tests now. That's good. So I didn't mean to derail you're saying you should also set this up at ci. Oh, yeah. Yeah. And pre commit makes that really easy to get set up in ci can essentially run the same command we ran earlier, which is pre commit, run dash, dash all files and set that up as something that validates all of your code changes. How does it get set up in ci? Like, do you have to do the pre commit dash install on every run? Does that get cashed and only do the Delta or what's the CIS look like there. So unfortunately, it's kind of a little bit fiddly to set up right now in ci, which is actually part of the project that I'm building right now I'm I'm actually working on building a kind of generic continuous integration solution that's aimed only at pre commit. Oh, nice. So if you want to check out more information on that it's pre commit.ci. It's kind of a work in progress right now. But you can set up pre commit with basically any ci provider that you have right now. But you have to manage the cache yourself, you have to figure out what command you want to run, you know, to figure out like what change set boundaries you want to run against. So like, maybe on a pull request, you only want to validate the files that change between master and your branch. And yeah, it's all pretty manual setup right now. And for Travis CI, it's five to 10 lines of gamble for Azure pipelines. It's like 30 lines of ammo. And like, if you don't get it quite right, or if the cache is invalidated, for weird reasons, like your run will be significantly slower. And that's actually one of the big selling points that I'm trying to push for on Preakness ci, which is that everything will basically always be cached because it can share all these tools amongst other developers. Just pretty cool. Right, right. Yeah, that's really cool. Yeah, but you've got a lot of examples, and I'll try to link to them in the show notes of how to set it up for Travis Azure pipelines, all the various things, right. So that's pretty straightforward. Yep. I think most popular ones are covered there. And if there's others that are missing, like feel free to send a pull request, yeah. Another cool thing that I'm trying to build into my

30:00 Ci tool is that like, and this happens probably like 10 or so times a month where someone will come to one of my projects and submit a pull request. And they haven't set up the client tooling locally. So they aren't running the pre commit hooks, they haven't run the code for matters or any of these things. And so they'll get kind of that middle to worst case of the CI system is telling them, oh, your code isn't formatted this way, like, here's a patch that would fix it. But the one of the features that I'm planning to build is that precompiled ci will just automatically fix pull requests for you. So you won't have to, even if you forgot to set up the local tooling, the remote tooling will just auto fix it for you. Oh, that'd be nice. And see, they're just saying, Oh, it's wrong. You have to fix it. It just it always comes in, right. Yeah, it just fixes it for you. There's actually a tool, I guess it doesn't always come in, right. Because it could be like actually broken Python. You could be validating, like, there's things that can't be fixed. But if it could be fixed, it should be auto fixed. Right? Yeah, it tries to auto fix things that are auto fixable. There's actually a tool, I think mariotta made it, I believe it's called blackout, which is a black specific tool that kind of has this same idea of like, just fixed the pull request for me. I sorry, I didn't set up back Luckily, yeah, but premium ci kind of aims to be a generic tool for these sort of auto fixing things. Yeah, that's really cool. So you talked about if I didn't set this up locally. So the way this works is, all of these get, in general get pre commit hooks, they don't stick to the repository. They're not like now part of the repo because I set them up, like every person on the team has to go through this three or four step process. Yeah. So they won't have to set up the configuration because usually you check that in so that it's shared amongst all your peers. But yeah, setting up the tool would be something that everyone else has to do, okay, and there are ways so there are ways to kind of automate this among peers. So like, one thing that we did at Yelp was we made a pre commit one of the Python dependencies of the application, so the installation part was skipped. And we also made the get hook setup part of the common make targets. So like, you used to run like, make minimal to get Yelp application working, or whatever, and make minimum width, install the hooks so that you didn't have to think about that sort of thing. Yeah, that's cool. In other in managed environments, there are other things you can do as well, like you can set a git init template, and the init template could already contain they get pregnant actually has a command that makes it easy to set up in an IT template. But you have if you're working just like on vanilla laptops, or like on open source projects, like you would need to consciously opt into this behavior. And actually, that's one of the bigger design principles of pregnant is like none of this stuff happens super automatically. Like you're always opting into the behavior, right? It's all optional. Okay. Yeah, I think that's as positive. Right? That's a good thing, that you're not forcing it on people. But with the CIA, you're kind of catching the Miss. Yeah, the other part is like, if stuff runs to automatically people get concerned about security. And so it tries to not be, you know, an arbitrary code execution engine, even though it is with opt in. Yeah. Well, people do get a little touchy when you automatically run tools against their source code.

33:11 Yes, I've received quite a few issues that are like, why does this thing exists? You should delete it now. And it's like, well, you don't have to use it back like,

33:24 talk Python. To me. It's partially supported by our training courses. How does your team keep their Python skills sharp? How do you make sure new hires Get Started fast and learn the pythonic? way? If the answer is a series of boring videos that don't inspire, or a subscription service you pay way too much for and use way too little. Listen up. At Talk Python Training, we have enterprise tiers for all of our courses, get just the one course you need for your team with full reporting, and monitoring, or ditch that unused subscription for our course bundles, which include all the courses and you pay about the same price as a subscription. Once For details, visit training, talkpython.fm/ business or just email sales at talk python.fm.

34:09 Let's talk about some of the tools, you know, we've set the groundwork, right, it's like this framework for bringing in all these different types of pre commit hooks. And when you say there's a lot that you bring in, there are a ton. So maybe we could talk through some of the ones that come built in that you think are cool, I'll grab a couple as well. And then there's a bunch of others that are external, but can be loaded. Yeah, sounds great. So presented itself as a framework, but there's also an official set of pre commit hooks that I and other contributors have written. And those are available at pre commit slash pre commit hooks on GitHub. And there's what 29 of them that are provided kind of out of the box. They're the original intent of the pre commit hooks repository was to be Python specific. However, it's more shifted to like take the Python parts, split them out and kind of aim to be language agnostic. So there's a

35:00 Lot of checks in there for like your common configuration format. So like there's checkers for like JSON, tamo, gamble, etc. There's actually some really cool ones for Python specific stuff. Like there's one that checks that you don't check in breakpoints or debugger statements like you accidentally put import PDB it'll flag that and make sure that that doesn't end up in production. Right? Right. You definitely do not want to leave a import PDB set breakpoint sort of thing, because I don't know why the server is locked up. But it just seems like if there's not it's not using the CPU, but it's just stuff timing out what's going on. Yeah, the number of times I've seen BTB quit in production. I'm like, Oh, damn it. This is entirely my fault.

35:43 But yeah, this kind of helps prevent that, or like, you know, I left a breakpoint in my test suite exploded because there was a breakpoint in it or stuff like that, right? So some of the stand out to me, like one was check as t, which checks whether or not Python files can be parsed. That's pretty cool. Because Python doesn't have a compiler, not in the command line version, since I know it sort of does generate the pi c, but kind kind does. There's not a build step is what I'm saying. Right? For sure. Right, which goes through and verifies all the files. And so this is a little bit like a build step for like, really basic stuff. Yeah, check a Steve kind of aims to be like a first line of defense, in a lot of ways like other linters, like flake eight could be a better choice than check a St. But yeah, check it is kind of like the most basic is this valid syntax j, we actually use this as part of moving to Python three lifts, we kind of had this like three to four stage process. And like the first stage was make sure all the syntax is valid Python three, syntax and like, make sure that linters pass in Python three, and then like, make sure that Python three specific linters, pass in Python three, and then run the test in Python three, and then go to staging and then production, and then delete all the Python two code that was kind of our trigger. It was kind of our first step there in that process. See? Yeah, that's really cool. Another is about checking for AWS secrets. Ah, yes.

37:06 How many times have you accidentally your entire AWS account to the public on GitHub? Actually, GitHub has some special checking for this now as well to make sure that your eight are your secrets or whatever are not leaked on commits. And Amazon actually has invested a lot in scanning public source code repositories to invalidate these tokens as quick as possible, because oh, yeah, that's cool. Because it costs you know, it costs individuals a lot of money, but it also Amazon spends CPU and other resources, mining Bitcoin, or whatever, whatever people are doing nefariously with these leaks, tokens, but yeah, this is kind of like, again, a first line of defense for not checking in. Yeah, you don't want to do it's not like eventually, someone's gonna find it. It's bad. So there's systems like, it's like seconds. Yeah, it's like, get sh.

37:55 A ticket is a project. There's a bunch of these, but this one is, should get find secrets and sensitive files across GitHub, including Jess Git lab in Bitbucket, in near real time. And it does so by hooking up to like the public a stream of activity. And like as soon as there's a check in, it's just after it right. And so there's three or four of these types of tools. And basically, it would be really nice if something like this detect AWS credentials, pre commit hook, didn't let that get into the mix, for sure. Yeah. And there's actually a lot of other tools around detecting credentials in the pre commit space. So there's, there's direct integration for lead. The tool is called bandit, which is a Python specific, yep. bandits, cool tool. And I believe they have direct pre command integration. So you can just set up bandit as a pre commit hook. I know there's a go project that's fairly popular that does the same thing. And pre command has direct support for go in that particular tool. But there's a bunch of tools in this space that make it really easy to prevent checking of sensitive credentials. Another one that stood out to me was no commit to branch. He, yeah, this actually was a community contributed hook. The idea being that you don't accidentally commit directly to your master branch or to your particular development branch. And this helps you enforce your production or whatever. Yeah, yeah, it helps you enforce your specific Git workflow that you want to do. Like you're specifically targeting, like your own feature branch, or like, you could use this to enforce the branch naming scheme or like a bunch of other stuff like that. But yeah, that one was actually contributed externally. Yeah, exactly. Basically, require people to use a proper PR workflow, like a git flow type of thing. Rather than just demonstrating. He did straight the master.

39:39 Just edited on production that's even quicker.

39:43 I deploy via SSH.

39:46 Yeah, another cool one that we used at Yelp was forbid new send bundles. We went through this. This is actually kind of a long story. So I'll keep it a little bit short. But Yelp went through a migration of installing all Python packages.

40:00 globally and the system to virtual environments. And there's some grumpy sis admins that were like super against virtual environments. And so we kind of changed our way to getting virtual environments by using said modules instead, and get sub modules that Python said models kits of models. And so there was one point where you felt Maine had 98, get sub modules. And while we were doing this, like fancy pip install workflow that allowed us to locally install all of these tools so that we didn't have to manage them at the system. But eventually, the grumpy people left the company and we moved everything to virtual environments. And part of moving to virtual environments was to burn down the tech debt of sub modules. And so we added this forbid new sub modules hook which prevents newly introduced sub modules while allowing existing ones such that we could kind of do some graph driven development and bring that down to zero. While we might need it towards the new world. That seems like a good path. They're very nice. When the I was wondering about is check executables have shibang, the little Ah, hash exclamation point, something at the top, which is fine, but how do you know it's an executable Is it because it has a Dunder main equals Dunder name thing in it or what counts as an execute. So this is more about file permissions and accidentally checking in files that are not executable. So one example where this commonly gets triggered is, when you copy files from a thumb drive, they'll commonly be on a fat 32 file system, which doesn't have full support for all of the permission bits. And so almost always the permission bits will the 777. And you'll end up checking in all sorts of non executable and non script files with executable bit permissions. And depending on your particular system that you're using, this can often trigger lint errors. And so this is kind of a first defense against that one particular system is like Debian is real grumpy about, oh, this PNG file has the executable bit set. So I'm gonna reject this package or whatever. Right, right. But this also helps you if you've forgotten to put a shebang. So like, sometimes you'll have like, an entry point to your application that you intend to be executable, but you've forgotten to put user bin Python three here, whatever, right? So this kind of checks for that. Okay, cool. So those are a bunch of the the built in ones, any of the other external ones you want to give a shout out to. yeah, so if you're setting up a Python project, you probably want an import sorter. And there are two, the two most popular ones, one of them I wrote, which is called reorder Python imports. It's very basic and what its name is, but the idea behind reorder Python imports is you set up essentially, no configuration, and it just does the right thing all the time. And if you kind of want the opposite of that spectrum, there's a tool called I searched, which has direct support in pre commit I started has 50 or 60 configuration options. So you can customize your import sorting to whatever, whatever thing you want. Some others that I would suggest is getting a linter like flake eight setup, and flake eight has direct support for pre commit as well. I'm also maintainer replicat. So, of course, it has support course, awesome. And perhaps the most popular, and probably one of the reasons that Preakness has taken off a lot recently is black, which is we've talked a couple of times on this already about black. But black is a code formatter that there is one way to do it right. And black does it in a very specific way. And Black has direct integration with pre commit, you can also set up things like type checkers, like there's my PI integration with pre commit, there's our nice, you know, other sorts of stuff like that. And one more that I'll shout out, which is, which is one that I've written tool called pi upgrade, which allows you to kind of upgrade your syntax to newer versions of the language. So things like automatically making f strings or like removing Python two syntax constructs or unsexy in your code, so to speak. And that's another one that has decent pigment integration as well. I see it looks for old, old mediums and styles and says stop doing that. Yeah, well, not only doesn't say stop doing that, it just auto fixes it for you. Oh, that's even better. But yeah, I really like my code formatters. Like, if a linter is grumbling at me, I'd rather have format or just auto fix it for me. Yeah, absolutely. All right. So that's the big list of things that you could use. But what if you have an idea for a new one? Yeah, part of the whole idea of pre commit is it's a framework for building and distributing these things, right? Yep. And it's really easy to make your own set of folks. That's actually another problem with the other sort of frameworks is that if you want to add tools to it, you kind of have to fork their framework and like inject your code directly into their framework. That pre commit takes kind of a distributed model to this. And basically any Git repository that has a little bit of metadata in it can provide a git hook as long as it's installable in some way. And the process is outlined on the premium documentation, but there's a bunch of different programming languages that this works with. And you basically set up a pre commit hooks dot yamo to provide hooks to other repositories. And there's a you know, you're gonna

45:00 following example, on any of the other repositories that already provide stuff, and it's also pretty easy to take an existing tool, add the small metadata file to it that can be used directly with pre commit. Of course, there's also a state patches. Like if you don't want to use the repository managed approach, you can set up local hooks that go directly to pipe, or like, use a Docker image or other stuff like that. And instead of going through the managed approach, although there's some disadvantages to that around, like automatic updates and workflow management, and that sort of deal, right, yeah, that looks really cool. So it seems pretty easy to make them and it's as long as it's installable via API or gem or NPM, or as an executable, it's more or less ready to go along with that metadata. Yep. Essentially, like if you can be get cloned, and something can run like pip install or the equivalent, so you're golden, you're ready to go. Yeah. And then under supported languages, where maybe languages is in like boats or something, and you've got conda Docker, Docker images, fail, what is fail. So there's a special hook, this was actually added specifically for pi test, which maintainer have failed, takes a file name regular expression, and will always return one if anything matches that. So one example is like, we wanted to enforce that our change logs were a specific file name pattern, because it's very common for somebody to write a change log, but forget the rst extension. And when they forgot the rst extension, then would go to run a release, we would forget their change log fragment, and so their, their change wouldn't be called out. And so this was a way to enforce with a particular message that a file name matches specifically, okay, yeah. Cool. And then you got goaling, node, Perl, Python, Ruby, rust Swift. script, I'm guessing bash and whatnot. That's a lot of options. And with Docker, like that's a pretty wide open, you know, thunder, you can basically do anything. Yeah, exactly. Exactly. Yeah. And it's actually pretty easy to add a new language. So if there's another language that you would like to see pre commit support, there's actually a little guide in the contributed and documentation for pigment on how to set up another one, someone's actually working on support for crystal right now, which is a language that I had not heard of, until relatively recently, but it's basically like compiled Ruby. It's pretty cool. But okay, yeah, there may be support for that. And, yeah, nice. Something I also saw in the docs was talking about automatically enabling pre commit on repositories you like registered with Git? So when you get a new repo or clone something, it'll just be part of it? Yeah. So there's this template directory concept in Git. Basically, you can run pre commit in it template there, and then set up specific configuration value in your Git config. And Git will splat out files in a particular way in your clone, such that you'll automatically be enabled. Okay. Yeah, that's nice. This usually involves just like writing the Git hooks picking that file magically. Right? Yeah. So if you really like this whole idea, then just make it automatic, right? Yep. We use this at lift so that whenever you would clone a lifter repository would automatically have everything set up, which works pretty well. Yeah. And one of the things that all the cool repositories are doing these days is they have those little badges like on the readme Mm hmm. Right? So it's Python three, or it has this many downloads on pi pi.org, installs via Pip, and so on, or whatever. And so you guys have a nice way to batch your project, as well as to say that it's powered by pre commit to sort of spread the word. Yep. Actually, this one's a really cool thing that I want to shout out. This was actually contributed by an external contributor, which was like, hey, I want a way to like, show other people like what tools I'm using pre commit is a really cool tool. And I, I want to see it succeed. And so they made SVG badge and like, made it really easy to set up a little bit of markdown or rst or whatever, text document format you want to use to Badger repository and spread the word. Yeah. Awesome. That's really nice. And then last question on this, I guess is, Are you looking for contributors to the project? There's a lot of people who asked me, hey, I want to get started in open source, Could you recommend something I work on with this project count is one of those. Yeah, we're always looking for new contributors. The thing with pre commit is, I don't want to say it's feature complete, but it hasn't really gained any large features in quite a long time. Most of the places to expand pragmatist's to add more tooling support or to add more programming language support. And I'm always looking for expansions in those spaces, or, you know, new ideas that I just haven't really thought of. Yeah, yeah, maybe there's some cool cool out there that's not integrated with you guys. Maybe they could commit to pre commit by actually committing to that other project to get the right metadata in there. For sure. Yeah. There are a few like, Help Wanted Getting Started smaller issues that are in the issue tracker, but I usually try and keep the backlog pretty well. trims. Yeah. Nice. All right. Well, what a cool project and it's definitely something I'm going to be checking out as well because it looks neat. Thanks. I'm really proud of it. Yeah, it should be. It's great. All right, before we get

50:00 here though, I gotta ask you the two final questions. You're gonna write some Python code. And I know you write a lot. What editor do you use? Oh, no, this is always a fun one. So if you would have asked me this six months ago, I would have been a little bit embarrassed about my answer, which maybe I'm still embarrassed about my final answer. But six months ago, I would have set nano as my text editor of choice, I've actually actually really good at using nano. But before that I was a IntelliJ. user. And I used that for the longest time. But I found that like context, switching between the terminal and not was was a lot of work. But anyway, my real answer to this question, I have actually written my own text editor.

50:40 Which, in retrospect, I would not suggest anyone to do this. It's been quite a roller coaster. But my texture is called babbie. The name is a little bit silly. It's actually nano. But if on a QWERTY keyboard, if you shift your hand over one and typo nano, you end up with Bobby. And this was like a common typo that I would, I would make pretty frequently. But yeah, it's open source. It's written in Python. Some of the ideas behind it were like, make it really easy to hack on and like add features to and like, I wrote a lot of it on my twitch stream. And interestingly, a bunch of the features in the text editor were actually contributed by just like, people in chat, which were like, hey, this would be cool if Bobby did this, and like, run off for 15 or 20 minutes and come back and have it working patch with it. Oh, wow. You know, new command or new, cool way to do something. That's awesome. But yeah, it's pretty easy to hack on. And that's part of the reason that I kind of like it. Is it pip install? babbie? Yeah, pip install babbie. And that should work on most platforms. It works on Windows, oddly enough, which was actually also another thing that chap contributed, like, I developed it entirely on Linux, and it's based on curses, and Windows doesn't quite have native curses support, but you can install it, like Windows curses package to make that work. Okay. Yeah, it should just be pip install babbie and can start hacking on it. It's not feature complete in any way. So like, there's some stuff that's, you know, like horribly broken, like if you try and rename a file and babbie. It's just like, No, I don't I don't know how to do that. Sorry. But I use it as my daily driver now. So it's, I know the shortcomings, and I'll eventually get around to implementing those. But yeah, cool. It's pretty neat. Awesome. All right. That is the first time anyone has ever said Bobby. So very cool. Now now we're getting the word out about it.

52:23 I'd be surprised if anyone else uses it to be honest. All right, cool. Well, very interesting. And then notable pi package, something that maybe people haven't heard about, ooh, something that people haven't heard about, or just that you came across, you're like, it's you know, it's not requests. It's not Django. It's like, all this thing I found is pretty awesome. People should know about it. It's not requests to me. You mean HTTP x. HTTP x is also cool. But it just, you know, it's like, the most common answer I would say is request. That's true. Yeah. I actually controversial opinion. I really don't like requests. So

52:54 can we do an unpackage? For this one, you get Well, how about it? What do you use instead of requests? So if I get away with it, like I don't need h2 support, or I don't need like special streaming API's? I'll mostly just use standard library euro lib. Yeah. Although for like newer stuff for I need, like streaming API's or h2, or I need async support, I'll reach for like HTTP x, or a IO. Yeah. A IO client. Yeah, in there. Yep. That's nice. I've actually like I've not done a lot of asyncio yet. I've been dabbling with it a little bit. I'm writing a chatbot for Twitch, and I've kind of started sprinkling around some of the AI libraries. What do you think about async? In Python? I think it's pretty good. There's a few, like, get it set up and get it moving things that kind of bug me, it bothers me that I can't just fire and forget async stuff. Yeah, you have to await it eventually. Yeah, yeah. Or like, like, if you just call the async function, you get a ko routine. But the ko routine hasn't started until you like turn it into a task or you give it to the loop to run. Yeah. So that bugs me a little bit. The fact that like, the threading module doesn't support async and await Hmm, yeah. But the IO module does, you know, the, like all of them should. So I actually am a huge fan of unsync Hmm, which is a library that puts a unified API on top of those and gives you the fire and forget, and it has a background thread that manages the runtime loop for you. So Oh, yeah, I've played a little bit with unseco. It seems really cool. So I think the what we have in Python is really close to right. There's just like a few rough edges. And I think it's not necessarily unsink is the right answer. But something like that is really close to like the final polish it needs to take it from like a B to an A minus or something level like that. Right? And then maybe once we get some interpreters dropping the gilding and take it up a notch as well. Oh, yeah. Summer interpreters gonna be really cool. Yeah, my problem with a thing because I feel like I have to rewrite all my code. Sometimes that barrier to entry. Yeah, the challenge can be like, it propagates right. You want to do it way down and well, that becomes a sink then the things that call it become a sink and the thing to call it become an async.

55:00 Yeah, that can be a challenge when you're testing library has to be done like all this all this other stuff. Yep. Yep. I think it'll get there, though. I think it's just like, Python has 20 years of history and async was introduced pretty late into that history. And there's gonna be some growth pains as it gets better, so to speak. Yeah. But please avoid thinking that really shines is where you don't have to think about it. Like where the thing that you're doing this async is on the boundary. So for example, FastAPI, right? You can just make your view method async or not make it async. Right. It's up to you. And then if you want to do that part of your app, async, you can, but it's not from the bottom up, but it's kind of from the outside in, you could pick on the Yeah, the entry points from the outside where you want to adopt it. And then I think that that makes a lot of sense there. Yeah. Especially like, because you have a framework calling you. So you don't necessarily have to do all a synchronous. Exactly, exactly. You get a plug into it or opt into it. Yeah. But as a library designer, it's more challenging, I think, how do you expose an async version and a non async version and so on? Definitely. Cool. All right. Well, that's not exactly one package. But that's kind of a conversation around packages. So let's maybe just put an answer to http x, or a IO HTTP client. Those are two cool HTTP libraries that kind of fit in that world. I like, That sounds good. Cool. Yeah. I mean, usually, my answers would be around like a code formatter and linter.

56:22 I'll actually shout out like two of my favorite linters that are cut from is that I tend to use, I do really love black, but I actually don't use it myself, I actually tend to use a combination of two formatters. One of them is called auto Pep eight, which is very similar to black and some of its ideas and like fixing lint errors. And the other one, which is very similar to black, but has existed for like four or five years is a tool called add trailing comma, which literally does what it says on the tin. It tries to enforce trailing commas in the same way that Bach does. Although, right, so the final element in like a list format, it would have that so that that line doesn't get a diff when you add the next element, yet, it tries to make minimal diffs when adding or removing like parameters and lists or tunnels or function signatures, or classic pictures or type annotations are. But the whole gamut of places where you can have braces and commas. Cool. All right. That's a bunch of good ones. Okay, final call to action. People are interested in pre commit, they want to check it out. They want to start using it. What do you tell him? So I would tell them to go to Preakness calm, that's probably the place where you can get the most free dash commit.com, right? Correct. Yeah, with a dash. The one without a dash is like some weird real estate website that I've been trying to bug the guy to, like, give up his domain for years, but I don't know that he actually reads this email at all. But, but yeah, I would check out pre dash commit.com. And that's where you can get all the information about that. And pretty soon, pre dash commit.ci will have more information about the CI system that I'm building around freakin Oh, yeah, that's really exciting. That's gonna be cool as well. I'll put a link to that in the show notes also. Cool. All right. Thank you, Anthony. It's great to chat with you and see all these amazing things you're creating. Yep. always glad to chat. Yep. But see you later. Bye.

58:07 This has been another episode of talk Python. To me. Our guest in this episode was Anthony Santilli. And it's been brought to you by brilliant.org and us over at Talk Python Training. brilliant.org encourages you to level up your analytical skills and knowledge, visit talkpython.fm/ brilliant and get brilliant premium to learn something new every day. Want to level up your Python. If you're just getting started, try my Python jumpstart by building 10 apps course. Or if you're looking for something more advanced, check out our new async course that digs into all the different types of async programming you can do in Python. And of course, if you're interested in more than one of these, be sure to check out our everything bundle. It's like a subscription that never expires. Be sure to subscribe to the show, open your favorite pod catcher and search for Python. We should be right at the top. You can also find the iTunes feed at slash iTunes. The Google Play feed is /play in the direct RSS feed at slash RSS on talk python.fm. This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Get out there and write some Python code

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon