Learn Python with Talk Python's 270 hours of courses

#311: Get inside the .git folder Transcript

Recorded on Thursday, Apr 1, 2021.

00:00 These days, Git is synonymous with source control itself.

00:02 Rare are the current debates of whether to use Git versus SVN versus some fossil like

00:08 SourceSafe versus you name it, but do you know how Git works? What about its internals?

00:12 I'm sure you've seen a .git folder in your project's root, but to most folks, that's a black

00:18 box. In this episode, you'll meet Rob Richardson. He's going to pop the lid on that black box as we

00:23 dive into Git internals and the .git folder, among other things about source control.

00:27 This is Talk Python to Me, episode 311, recorded April 1st, 2021.

00:32 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the

00:51 ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where

00:55 I'm at mkennedy, and keep up with the show and listen to past episodes at talkpython.fm,

01:00 and follow the show on Twitter via at Talk Python. Congratulations to Mike Manning. He's the final

01:06 winner of our PyCon ticket giveaway. Thank you to everyone who entered, and if you didn't win,

01:10 I hope you're able to get a ticket and support the PSF, the Python community, and be part of that

01:15 awesome conference. See you in May. Rob, welcome to Talk Python to Me.

01:20 So glad to be here. I'm really excited that I get to join you. Great to meet your audience.

01:24 Yeah, it's great to have you here. You got to meet Intersection. You gave a talk at the Python

01:29 web conference recently that I also spoke at, and your talk was really interesting and certainly

01:35 relevant to the Python folks, so I thought it'd be cool to have you over here. And, you know,

01:39 I should give credit to Paul Everett for connecting us. He's like, oh, that was a great talk. You should

01:43 go talk to Rob. So thanks, Paul, as well, who was not long ago on the show.

01:46 Yeah. I've been chatting with Paul about thoughts around the talk as well. He's a really brilliant

01:51 guy.

01:51 Yeah, he is. He definitely is. He's been doing a lot of cool stuff for a long time. So yeah,

01:55 he's a great guy. Now, before we get into Git and all those types of things, which, you know,

02:02 it's really surprising to me how much it's taken over the world, right? It used to be,

02:05 there was always a question, well, what source control do you use? Like, that's not a question

02:09 I hear all that often these days, not at least as much as it used to be. But before we dive into the

02:14 details of that, let's start with your story. How'd you get into programming?

02:16 This is actually a really fun story. I was 10. I was at the library because they had the computer

02:22 and we'd play video games. And the methodology of how you do this is you go up to the counter and you

02:28 flip through the book and you go find the video game and you show them that page and they give you the

02:34 disc and you have the save icon. You take the save icon and you put it in the computer and you play the

02:39 game. So I had finished playing my game and I went back to the desk to go pick another

02:44 one flipping through the, you know, plastic sheets. And I found a drawing program. And I said,

02:50 I'd like to play this game. They gave me an eight and a half by 11 sheet of paper. The top two thirds

02:55 was graph paper, you know, graphs. And the bottom third was how to write the program to draw that on

03:03 screen. Oh, cool. Okay.

03:05 And it was so much fun. I got to start to build content that was in my mind in real life in this

03:13 artistic medium with a very technical implementation. So, you know, that was so much fun. I did never

03:20 return that game. And so that kind of brought me into the world of software development. I always

03:26 thought it was just a fun thing that people did. I didn't realize it was a career. So it wasn't until

03:30 really late in my college experience when I realized that I could do this for a career.

03:36 And so after I graduated, I got into programming professionally and I've had a really fun time

03:42 coding now for professionally for more than 20 years.

03:45 Yeah. Awesome. I think programming is special because it's one of those things you kind of

03:50 hinted at where you, you think of something, you dream of something, you imagine something.

03:53 And then with a little bit more thinking, that thing can become real. Whereas, you know, so much

03:59 of what humans do, it's one or the other. I could tell an amazing story and write the book,

04:04 or I could go build an awesome house. But normally those things don't actually coexist where you think

04:10 a lot about something and they come into existence. But I do think that's a magical part of what we get

04:15 to do. And I think it captures a lot of people's imagination.

04:18 And what's really cool is that in this digital world, there are a lot less boundaries, a lot

04:23 less constraints. There's nothing telling me that this pixel needs to be in this certain way.

04:27 I can draw whatever I want on these pixels on the screen.

04:30 Yeah. Yeah. And modern day, we have cloud computing and we have incredible computers. Like

04:35 the sky's the limit. It's really, really awesome. Also money. You don't have to go buy tons of hardware

04:41 for many things that we do. Right. So really cool. Now, how about today? What are you up to these days?

04:46 I'm doing a lot with software development, cloud-based development, a lot of websites,

04:51 a lot of web properties, ASP.net and Node on the backend, React and Vue on the front end.

04:57 Taking that into interesting modalities, I've started to play with Raspberry Pis and that's

05:01 really fun. And getting to dig into all the things. I've gotten really good at doing DevOps as well.

05:07 Part of my passion is being able to share this knowledge with others. So I do a lot with teaching

05:13 both at user groups and conferences and elsewhere mentoring. And so it's really fun to be able to

05:19 not only learn these new skills, but also pass it on to the next generation of developers too.

05:24 I love to say that it's not that I'm really good at it. It's just that I've been collecting things for a

05:30 while. So let me add to your collection too.

05:33 Well, I think one of the things that's really interesting about becoming an expert in programming,

05:37 people who are beginners or maybe don't do programming at all, they see that person as

05:42 incredibly talented and incredibly smart. And they may be, they often are, but I feel like the real

05:49 big difference is I've spent 10 years gathering up these little tips. Like, oh, I tried this.

05:54 That doesn't work very well. That crashes. You try to talk to the database that way. That's bad. Oh,

05:58 by the way, I've also built up a couple of examples of what databases are. And I've seen,

06:02 you just have this almost more experience than, I don't know, like innate skill, right? So it's

06:11 really cool. You can just kind of layer on these skills over your career. And the reason I think

06:15 that's powerful is it's, they're very easy to communicate back to other people, right? If,

06:19 you know, the way Nietzsche did philosophy or the way Euler did math, like you can't, or Bach did music,

06:28 like you can't easily communicate that to someone. Like if it's this crazy innate skill,

06:33 you can sort of communicate it, but it's not the same. But with programming, I think it's very easy

06:38 to transmit it on and pass it on and ways to help people like level up. It's super fun.

06:42 And it's really easy to get started. You know, programming languages have become much more

06:46 approachable of late. And so if you're new to programming and just starting to dabble in it,

06:52 you don't need to buy a big expensive thing. You know, the laptop that you're using to browse the

06:56 web is probably sufficient for building simple programs. And so, you know, dive in,

07:02 use free tools and just start building stuff. It's really approachable and really fun.

07:08 It absolutely is. And one of the things that just never ceases to blow my mind is I can be in a coffee

07:13 shop working on a relatively cheap laptop doing my coding. Git push, speaking of Git, something happens

07:20 on one part of the cloud. It triggers a web hook somewhere else that then grabs the code and could run

07:25 that on a tremendously powerful data center and computer or suite of computers, a cluster of

07:31 computers. And yet I get the experience of basically building this super powerful thing on my very wimpy

07:36 little laptop. It's just cool that you can create things like Facebook or Google, or you name these,

07:43 these really large, amazing apps, but you could kind of just do it on like a laptop.

07:47 Something that existed in my mind yesterday exists in the cloud and scaled to any user that wants it

07:56 tomorrow. Yeah. That is really fun.

07:58 Yeah. It's super, super fun. Before we move on, you also mentioned that you've been playing around with

08:03 Raspberry Pi.

08:04 So, something that I covered recently on Python Bytes, my other podcast, is somebody built a water-cooled

08:11 Raspberry Pi cluster computer. So, eight Raspberry Pis in one thing, all of them overclocked and water-cooled.

08:20 Have you seen people doing this stuff? It's crazy.

08:22 It's really cool. And as you start to get into clustering programming, clustered programming, you know,

08:27 multi-machine type of experiences, a Raspberry Pi is a really cheap barrier to entry. You know, for $40 or so,

08:34 you can get a Raspberry Pi, get three or four or five of them, cluster them together, and now you get the sense of

08:41 what does it take to build parallel machinery. And it is really, really fun. So, you know, to get an eight or ten or a hundred or a thousand,

08:51 node Raspberry Pi is pretty sweet.

08:55 Yeah, that's awesome. I mean, you can do something similar with Docker, right? Fire up a bunch of Docker containers, but it's not the same

09:01 feeling as like, there's actually eight of them over there, and they're actually talking to each other and working together. I think it's a very

09:06 different feeling. It's super cool.

09:08 Containers do help us start to approximate that. But yeah, there is some lying to ourselves to believe that all of these containers running in context on my one laptop are really a

09:19 distributed system. Yeah, absolutely. It's not the same, but it does let you sort of play around there a bit. Right. All right. So I want to talk about Git

09:26 primarily, and that's what your presentation at the Python web conference was about. And that's what we're going to center our conversation on. But you know, you and I have both been around the

09:35 industry for a lot of different things.

09:38 Yeah.

09:38 So maybe let's talk a little bit about the history of source control. You know, I think of source control as a spectrum from what source control all the way to get and distributed source control. Maybe, you know, there's, I've talked to people and I've seen it in action. Source control is I've got a file code file,

10:01 and I've called it version one version two version two edited version three version three final final two, you know, just like, or maybe if it's a lot of files, you zip the folder and you name it like that. Right. Like that's the beginning of source control. He's doing it wrong, but it's getting getting there. Right.

10:18 Well, it's doing it in exactly the way that you needed at that time. Copy folder versioning is definitely a thing. Dot bu dot date, you know, copy that content off to make sure that you have it. And that's really what we're after with version control is when we think of version control, we're really talking about two things.

10:38 One is archiving the history of my journey so that I can get back to a known good state if things go bad, but also communicating with my team to be able to convey the progress of this system.

10:52 Yeah.

10:52 And copy folder versioning does the first one real well. It doesn't do the second one real well. There are systems that I've worked on where, you know, to upgrade the system is to first copy all the things into the dot backup folder and then upgrade the primary thing. And if it doesn't work correctly, then you point the...

11:11 Put it back, put it back, put it back, put it back.

11:12 Yeah. They point the web server at the backup folder. And so now the system has been running out of the backup folder for, you know, six or eight months or a year. And now we go to upgrade and step one is to take, oh, wait, we just took down the site. Now we have no known good backup thing.

11:30 Yeah. Without some sort of source control, the real thing I think that falls apart, maybe you're doing the file versioning thing, which is still not that ideal, but the thing that really falls apart is collaboration.

11:40 Right. Soon as two people want to work on something, it's not okay to say, well, here's my zip version. Can you merge that back together? And probably you don't have merge tools either. So what does that even mean? Right.

11:49 So I quickly gets us into where I think people probably should be in some sort of version control, but back in the day, that was different stuff. For example, you know, maybe that was subversion. Actually, if you were on subversion, you were in a good place. I mean, a really good place.

12:06 Yeah. Subversion was really cool. Subversion was an upgrade to CVS where CVS would version each file or each folder separately. And so nested folders just happened to kind of be together in this clump.

12:21 And what subversion gave us was we're versioning the entire project together in one piece. Before that, we might've had source safe or other pieces. Team foundation server kind of fits into this realm as well.

12:35 And so it's that mechanism of versioning all of the pieces together and then being able to publish that to a central place. What makes all of these systems kind of unique, specific is that they're all really client server pieces.

12:49 So version was really good at being a client server piece.

12:53 I'm going to go out on a limb and say it was the best client server version control system that I'm aware of.

12:59 Yeah.

12:59 I think so.

13:00 These systems though, kind of a fundamental flaw because we want to use version control for those two pieces.

13:06 We want to use it to be able to back up the work so that I can get back to a known good state and to communicate with our team.

13:14 And the hard part with these client server version control systems is we're doing both every time we commit.

13:22 So when I commit a change to subversion, I'm immediately publishing it to all of you.

13:28 So the analogy that I like is when I'm rock climbing, I want to be able to put a carabiner in the wall as frequently as possible.

13:38 If I climb a foot and I fall, I'm only going to fall a foot.

13:42 If I climb six or eight or 12 feet and I fall, I'm going to fall 12 feet.

13:47 Well, actually the nature of the rope is that it's going to swing all the way down.

13:50 So I fall 24 feet and that's a long way to go.

13:54 I want to stick pieces in the wall as frequently as I can.

13:58 You don't want to see me spamming the thing every time I get there.

14:03 So I get to the point where it's like, okay, I finished the thought.

14:06 I'm good.

14:07 I want to mark this save point, but I'm not ready to publish it to all of you.

14:11 Yeah, because really the thing you should be working with most of the time is if I publish it to the rest of the team, it should be, it should at least run.

14:20 Right.

14:21 Yeah.

14:21 Probably the test should pass.

14:22 Maybe you can fix that.

14:24 Like you're going to work with somebody, but it shouldn't just mean nobody can build or even start the software at all.

14:31 Because you've got the save point in the middle of their work that is inconsistent or halfway there or whatever.

14:37 Right.

14:37 And so I've reached the stopping point, but I'm not done.

14:39 It doesn't work.

14:41 And so I have this moral dilemma.

14:42 Do I mark a save point and inflict that on all of you or do I not?

14:47 And that's when I fall back to a secondary version control system where I start doing copy folder versioning again, where it's like, I just want to take all my stuff and stick it in this spot so that I have this known good state.

14:59 And that's where we pivot to distributed version control systems of which Git is one of them, where we have a separation between the commit stage and the publish stage.

15:10 And that isn't the official terms that Git or any of the rest of them use, but there's a process of marking those save points.

15:18 And then there's a process of collecting all of the save points and publishing them to others.

15:23 It takes a bit of a mind shift to get used to it as well when you're working on it, because if you come from one of these other systems, I committed, so it's saved.

15:29 Right.

15:30 But commit in a distributed source control system means it's a local save point until you Git push or whatever other immacterial's equivalent of a Git push is.

15:41 Right.

15:41 Yeah.

15:41 An HG push.

15:42 And so it's exactly that.

15:44 It's marking save points however frequently you want and then combining those save points together into a cohesive story.

15:52 To publish to your colleagues.

15:54 And that's what makes distributed version control so powerful is separating those two concepts.

16:00 Mercurial, Git, Perforce.

16:03 There are other distributed version control systems.

16:06 And as the world was moving from subversion and TFS into this distributed world, we experimented with each of them.

16:14 You know, arguably Git wasn't the best.

16:17 We might have done a VHS and Betamax type of thing.

16:21 But clearly Git has become the de facto standard version control system.

16:26 It is distributed.

16:27 And now we can separate the save points from the publish points.

16:31 Talk Python to me is partially supported by our training courses.

16:35 At Talk Python, we run a bunch of web apps and web APIs.

16:39 These power the training courses as well as the mobile apps on iOS and Android.

16:43 If I had to build these from scratch again today, there's no doubt which framework I would use.

16:48 It's FastAPI.

16:49 To me, FastAPI is the embodiment of modern Python and modern APIs.

16:54 You have beautiful usage of type annotations.

16:56 You have model binding and validation with Pydantic.

16:59 And you have first class async and await support.

17:02 If you're building or rebuilding a web app, you owe it to yourself to check out our course, Modern APIs with FastAPI over at Talk Python Training.

17:10 It'll take you from curious to production with FastAPI.

17:14 To learn more and get started today, just visit talkpython.fm/FastAPI or email us at sales at talkpython.fm.

17:23 I think another really important thing to highlight for people who haven't been there, right at the Git homepage, they highlight Subversion, which we've been talking about.

17:34 But Perforce, ClearCase, SourceSafe, TFS, a lot of these things, there's two things.

17:40 One, they would lock files.

17:42 Like if you wanted to make a change to a file, you would claim it.

17:45 Like I'm editing main.py, well, no one else can interact with that file.

17:51 It's literally made read-only on your computer until, you know, until that person is done.

17:56 And they had better not forget and go on vacation while they got some files checked out.

17:59 That's the one thing.

18:00 The other is you need permission to participate in a project.

18:05 You have these gatekeepers and you need to sort of prove yourself to the gatekeepers.

18:09 So if I wanted to commit, I wanted to work on Flask, if it was under Subversion, I have to go.

18:15 Can I have permission to go read, get read-only access to Flask?

18:18 If I want to make a change, I literally have to say I need permission to commit back to Flask.

18:22 With the distributed ones, you clone it, you do your proof of work, your proposed idea.

18:28 And if you want, you can contribute it back or you could just go in a different way, right?

18:33 There's this very interesting separation of I can kind of work on it and then see if I want to contribute

18:39 it back rather than the other way around.

18:41 I have to get permission to contribute.

18:44 And I think that's a super critical thing in the open source space where there's a very

18:47 loose coupling of people and projects.

18:49 Like if somebody comes to me and says, I want to work on, I suppose I'm working on Flask.

18:54 They come to me, I'm in charge of Flask.

18:56 They come to me and say, I want to work on Flask.

18:57 Like, well, maybe.

18:59 What else have you done?

19:01 Show me.

19:01 This is a huge project.

19:03 We do not want you to mess up Flask.

19:06 But.

19:07 And we had a little bit of that with SourceForge.

19:10 You know, you could clone the repository in Subversion and just work on it locally, but you

19:16 weren't able to participate.

19:17 The moment that you wanted to help, it was a really frictionful process where, you know,

19:22 okay, so I have this diff.

19:24 Now, I don't have write permissions.

19:26 So am I going to, you know, bake this diff into an email and hope somebody reads it?

19:31 Do I just use it locally?

19:33 Do I fork the project and only have our corporate version of it?

19:37 You know, it was very difficult to participate.

19:39 And that's not a feature of Git per se, but rather the GitHub, the shared hosted mechanism

19:47 around Git that has grown up as well.

19:49 Yeah.

19:49 I mean, with Git, you can clone a thing and then work on it as long as you have read access.

19:53 But yeah, the additional mechanisms, the Git flow around it is certainly something created

19:58 by GitHub with like PRs and forks and the emerging upstreams and all that origin upstream stuff.

20:06 One thing I did want to ask you before we get to the details is why do you think Git won?

20:10 You did talk about this Betamax, VHS sort of thing, and there are other options out there

20:15 for distributed source control.

20:17 I have a theory, but what are your thoughts?

20:19 And I have a theory too.

20:20 I don't have the answer and maybe our listeners will help us discover what the correct answer

20:26 is, or maybe there isn't one.

20:27 In my mind, a lot of the time we were looking at ways to compete with things.

20:33 You know, we had things that would compete with CFS or subversion because, you know,

20:38 we wanted a little bit more or we wanted to make money on the process of source control.

20:42 And what's really interesting about Git is that it has become so pervasive.

20:48 And so we're not building competitors to Git, we're building integrations into Git.

20:53 All right.

20:53 We're building on top of Git.

20:54 Yeah.

20:55 Now, arguably GitHub helped with that too.

20:57 GitHub has a really, really powerful community mechanism for that.

21:02 And GitHub really only did Git.

21:05 But I would argue that Git is really cool because it's free and open source.

21:11 And because it's free and open source and it has that community mechanism around it,

21:15 we don't need to compete with it.

21:17 We don't need to try to make money on this.

21:19 Instead, we can build collaborations with it and mechanisms working with it and build up the

21:25 community together.

21:26 My thought as well was GitHub.

21:28 Yeah.

21:28 It's the thing that brought not just the server infrastructure to privately have code.

21:34 It brought the community and it brought the flow that allowed people to collaborate in

21:40 ways that could let them collaborate once they've proven they have something to collaborate.

21:45 Right.

21:46 Here's my PR where I've already shown you the thing that's amazing that I want to offer up

21:50 to you.

21:50 Well, that does look amazing.

21:52 Thank you.

21:52 Who are you?

21:53 Let's talk about this.

21:54 Right.

21:54 It's a different conversation than I've never seen you.

21:57 Why should I give you right access to Blask on SVN?

22:02 And it's exactly that.

22:03 GitHub has these magic levels to it where at the very first level, it is just an online

22:09 source code repository system.

22:10 And so how is that different from SourceForge or TFS before it?

22:16 It isn't at this level.

22:18 And so if that's what you're using GitHub for, then that's perfect.

22:22 You know, back up your local projects up to GitHub, get your content off of your machine

22:27 in case there's a disaster.

22:28 That is definitely the first level.

22:30 The next level starts to build workflows around it where we can say, I want to create issues.

22:35 I want to create project management things.

22:38 I want to create milestones.

22:39 I want to create goals.

22:41 And so that's kind of the next level.

22:42 Leveling up again, we can start to create a social community around that where we can start

22:49 to have conversations around the content where I can create a diff and we can all talk about

22:56 it and we can collaborate on it.

22:58 And once it's good enough, now we can pull it in.

23:01 Add to that then the mechanism around pull requests and things like that.

23:06 Git has a concept of push and pull.

23:09 You know, publish and receive, I guess, might be the terminology that matches here.

23:15 And what's interesting about a pull request, I don't have right access to your repository,

23:20 but I want to contribute.

23:21 So instead of pushing my content to you, I'm going to request that you pull it from me.

23:29 And so no longer do I need to create this email and write out all the content and hope you read the email.

23:35 I create this code up in my space and I request that you include it in your space.

23:43 And that made collaborating with projects really, really easy.

23:49 So with that comes the next level of GitHub where we have these communities that can socialize

23:54 and develop and hang out in this coding space.

23:59 And that's really what made GitHub so magical is that we have this community around coding.

24:05 Where previously with SourceForge or other environments, yeah, we had that online source control system,

24:12 but we didn't have those levels of interaction.

24:14 So pull requests or merge requests or whatever you're going to call it, is that mechanism of being able to collaborate with low trust type of environments.

24:24 I want to offer up my solution to the community and see if that's going to fit into this ecosystem.

24:30 Yeah, I think that's why I get one as well.

24:32 And Verda Rose out there says, open source is the best way to learn and improve technically and collaborate with people you don't know.

24:40 Yeah, and I think it's that the people that you don't know, it makes it special because it allows you to create these connections with people all around the world

24:48 who you would otherwise not meet.

24:51 And you get a chance to work with them.

24:53 Even if you live in rural Canada and you want to do software development,

24:58 maybe no one around you is really good at whatever you're trying to do, but go to GitHub, find a project.

25:03 You can collaborate with the best people in the world on that.

25:05 We can create these communities around our passions for technology or the problems that we want to solve,

25:11 not necessarily based on the geographic boundaries that we find ourselves in.

25:15 Yeah, absolutely.

25:15 All right.

25:16 So that's the history of it a little bit.

25:19 I talked a little bit on why distributed source control is really powerful.

25:23 And I think it has really unlocked open source in a special way and on a much larger scale than it has.

25:30 And it is interesting to note that Git and GitHub are different.

25:33 GitHub uses Git under the covers to be able to build its social experiences.

25:38 But Git is a thing that is separate and distinct.

25:42 There is no pull request concept in Git, for example.

25:45 And with Git on your local machine in a cave, you can version and create those save points.

25:53 When you're ready to socialize, to publish your content, to communicate with your team,

25:57 you can use Git together with lots of services, GitHub or Bitbucket or GitLab,

26:04 or there's lots of private services as well that allow you to create that online community.

26:11 You know, GitHub has published their magic sauce to the world and lots of us have cloned it.

26:16 So GitHub is still the place where we code for the most part.

26:20 But if you prefer coding in another community, then that's totally fine.

26:23 You can still use Git and all of the tools to be able to create your save points and publish that content to others.

26:31 You could just publish it to a different server as well.

26:33 Git and GitHub are different.

26:35 It's easy to see them as the same thing.

26:37 But yeah, they're absolutely not.

26:39 We've got all these different locations.

26:40 I have mixed emotions, mixed feelings about if you have another project and you put it somewhere else,

26:46 I'm not going to name any particular service, but let's just say somewhere that's not GitHub.

26:49 That's totally good.

26:51 But at the same time, so much of the open source flow is around GitHub and the stuff that's happening there.

26:57 I don't know.

26:58 It's just really interesting to think why you might be at one place and not the other place and so on.

27:03 Yeah.

27:03 And a lot of people were worried when GitHub was bought by Microsoft, is this going to be the end of the community collaboration?

27:10 And I think Microsoft has been a really good steward of the GitHub community and really making sure that GitHub is still available to all of us and facilitating the success of that ecosystem.

27:23 Yeah.

27:23 There was a lot of hesitancy and concern within certain communities.

27:27 And I feel like they've done a great job.

27:30 Yeah.

27:31 What I didn't realize was that GitHub really needed some help from somebody.

27:37 financially, they were not doing as well.

27:40 And I looked at them like, oh, this place must be incredibly successful.

27:43 But what came out after some of the reports and stuff, it was kind of important that someone came along.

27:49 And if that's the case, then I was head over heels that Microsoft bought them.

27:53 Last thing I want to see is them go away.

27:55 And I think they've done a good job of just letting them be.

27:57 Don't go mess with them.

27:59 It's working really well.

28:00 So I think it's been a good deal that worked out there.

28:04 See also Docker for an example of a community that is amazing and contributing, but doesn't have a financial business model to be able to survive.

28:12 Yeah.

28:12 Yeah.

28:13 Hopefully things go well for Docker, but it's just tricky.

28:16 They tried the enterprise thing and then they're switching to other things.

28:20 Yeah.

28:21 I love their pivot back to focusing on developers in the community, which is wonderful.

28:25 But I still feel like they haven't found their spot that allows them to be business successful.

28:32 And the hard part is you can only do that for so long and then you need to pivot to something that can start to facilitate the business.

28:41 Yeah, absolutely.

28:41 All right.

28:43 Are you ready to go into the Git folder and find where the hidden magic lives?

28:47 Yes.

28:48 If you go into a project that you've Git cloned or you've Git inited and you create some files and you mess around there, you don't see anything different.

28:57 It looks just like any other folder that might have a project in it.

29:01 Right.

29:02 But in there actually is contained the almost the entire backup, the entire contents of all the versions of those files.

29:11 At least every branch that you've checked out hidden in the hidden .git file.

29:16 So .git on.

29:19 And it's not almost.

29:20 It is.

29:20 That is the entire history of the project.

29:24 So the way to back up a Git database, misusing that term, is to grab that .git folder and copy it.

29:32 Inside that .git folder is lots of files that describe the history of the project since its inception down to the current version.

29:41 And so, you know, kind of the only way that you can break Git is to open up that .git folder and change stuff.

29:48 By default, this folder is hidden on most systems.

29:52 So you may have to show hidden files and folders to be able to see the .git folder.

29:56 But it's there.

29:57 And it's really powerful.

29:59 Yeah.

29:59 And so on Windows, you go to the Explorer, there's like one of those little ribbon things that drops down.

30:04 It's a checkbox for show hidden folders and files.

30:06 On macOS, I learned you can hit shift command dot and that will show hidden files.

30:12 That I did not know.

30:13 And I was very delighted when users told me about that.

30:17 On Linux, I don't know.

30:19 I mean, you can go and do an LL in there on the terminal.

30:21 But there's probably some way to show it in the Explorer equivalent as well.

30:25 You can navigate into it from your terminal or wherever.

30:28 And once you're inside of it, Yeah.

30:30 All of the files are right there.

30:31 Yeah.

30:31 So we go in here and we find things like head, config, description, hooks, index, info, logs, objects, hacked, refs, and refs.

30:40 You want to maybe give us a rundown of what each one of these are?

30:44 And then we can dive deeper with one of the tools that you built into maybe some of the things like refs and so on.

30:50 But yeah, also maybe hooks.

30:52 But yeah, wherever you want to start.

30:53 What's cool in this database is it is the entire history of your project.

30:58 And it's zlib compressed.

31:01 So for example, the 20-year history of Perl, the .git folder is ever so slightly larger than the checkout folder.

31:12 And that includes the entire history, including all of the changes and all of the authors and all of that is really nicely compressed into this folder.

31:21 Wow.

31:21 It breaks down into a couple of groups of things.

31:24 We have the content.

31:26 We have branches and tags, you know, references to the content.

31:32 We have configuration details around this repository.

31:36 We have index files.

31:38 We have temp files.

31:40 And then we have automation tools.

31:41 And so these are kind of the groups of things that we'll find in this folder.

31:46 A lot of them happen to be in their own folder, which is really nice.

31:50 So for example, hooks is the place that you go for automation.

31:52 Refs is the place where all of the content is.

31:55 No, refs is the place for branches.

31:58 Objects is the place for the content.

32:01 And so a lot of the things that we'll see will have their own folder, but some of them spill out.

32:06 Like configuration is in the config file, but there's also some stuff in the info folder for that.

32:12 Indexes, we've got the index file right there on the root directory, but we also end up with index files inside of pack folders.

32:22 And so, you know, it gets a little bit interesting.

32:25 The first one to dive into is probably the objects folder, because this is the stash of all of the content in your repository.

32:33 Now, as you commit something into Git, you'll first add it to the staging area, and then you'll commit it with a message.

32:41 And as you do so, you'll end up with content inside the objects folder.

32:46 Now, what's interesting to note here is if you look at a Git log, you'll see hexadecimal thing.

32:53 You know, it might be seven characters, or it might be much longer than that.

32:57 And as you do that log, you can take a look at that.

33:01 Inside the objects folder are folders with two digits.

33:06 Those are the first two digits of the commit number.

33:08 Inside that folder is all of the commits that happen to start with that two-digit number or letter.

33:15 So, you know, that means that not all of the files will be in one directory.

33:20 They'll kind of be arranged a little bit.

33:22 That gets around too many files in one directory.

33:25 But it's that objects folder that then stores all of the content there.

33:31 Now, what's interesting is I think of it, if I commit one, and that's where this talk was really cool.

33:37 When I commit one thing and I go look in that objects folder, I will have three different files.

33:43 Now, they are zlib compressed.

33:45 Yeah, you can't just open them up and look at them, right?

33:48 They're kind of scrambled up.

33:50 But it's not magic.

33:51 I built a tool that will unzlib compress one, which is pretty cool.

33:56 But once we identify a thing that we want to do, we can also use git catfile.

34:02 Git catfile allows us to look at both the type and the content in a particular node.

34:10 This is a directional, acyclical graph nodes, DAG nodes, that specify relationships between these things.

34:19 But what's cool is we...

34:21 Here's a branch.

34:21 Here's a file that's in that branch, something like that.

34:24 They're not branches, but they are folders.

34:26 Here's a file within a folder.

34:28 Here's the content.

34:29 So we have three different types of these nodes.

34:32 One is a commit.

34:33 And in the commit, we have the author's name and the date that it was committed, the message that we gave.

34:41 And also in that commit is a reference to the tree nodes that are part of that commit.

34:48 Now, each tree node can specify files or folders.

34:53 So a tree node can reference another tree node.

34:55 And inside the tree node, we have references to those files.

35:00 So I might have a tree node that references file1.txt.

35:04 The third type is a blob.

35:06 And so as we look at blobs, then that's the actual content in the thing.

35:12 So go back to the...

35:15 Click on...

35:16 Oh, I don't think I have a one to get back to the blob.

35:19 But the cool part about this app, hit refresh, and you'll get to that big blob of stuff.

35:24 Here's all of the commits in this repository.

35:28 So we had something visually to look at here, and it's about to pull up its rendering.

35:31 Yeah, it's not super performant.

35:33 That's all right.

35:34 You built this thing called Git-Explorer, which is a little web app that runs that you pointed at a Git repository.

35:41 And it lets you look at these things that you're describing visually and then click around on them, right?

35:47 Right.

35:47 So click on Show Type, and we see the three different colors emerge.

35:51 There are commits, trees, and blobs.

35:53 And it's like, okay, I have a whole bunch of files in my objects folder, and I can click on each one, and I'll use that Git cat file thing to go figure out what it is.

36:03 But it's like, you know, I really wish I had more stuff about it.

36:09 So that's where I click alphabetical, and that will put them all in order.

36:14 Click on Tags, and now you can see the name of that thing.

36:18 And I'm only showing the first seven digits of the commit here.

36:22 But now you can kind of get a sense for, here are all the objects, and click on each one and open it up.

36:27 Right, and these names are what often go by shahs in Git parlance, which is just the type of hash, S-H-A, whatever it is.

36:37 And I don't know how many people know this, but you can use sub pieces of the shah to refer to it in Git.

36:44 So you don't have to say the full, I don't know, what is that, 32 characters or whatever to describe a name.

36:50 As long as it's enough to be unique, it'll go, like, you can issue commands against these things in this abbreviated form, right?

36:56 Right, exactly.

36:57 So oftentimes, only two digits is necessary, sometimes three or four.

37:01 And that's why often when you're looking at Git history, it'll only show you the first seven.

37:05 Surely enough, yeah.

37:06 Now what we start to do as we're clicking through this is we get a feel for all of these green nodes.

37:12 That's the content in the files.

37:14 The blue nodes are the tree nodes.

37:17 And as I click on one of those blue tree nodes, then it references other files.

37:22 I can see their shahs, their Git hashes there in that list.

37:27 And then as I look at the red ones, the commits, that's my commit message.

37:30 That includes the parent node that was the commit right before this.

37:35 It also references the tree node that has the files for this.

37:39 And so wouldn't it be nice if we could, I don't know, arrange them in a way?

37:43 So let's, instead of going from alphabetical, let's click on parent-child and start to see the relationships.

37:50 We'll need to turn on lines now, and we probably want to also turn on tags.

37:54 And now we can take a look at those commits and see how each one references.

37:59 Now, if you have a very large repository, then I haven't built scrolling yet.

38:05 Sorry.

38:05 But you can see that the red commit nodes all reference each other and reference the previous

38:13 ones.

38:13 And then they go into these tree nodes that may reference other tree nodes.

38:17 And eventually those reference the file nodes.

38:19 That's part of my demo highlight that if we create the same file content and commit it in

38:25 two different directories, it's actually only one blob on disk.

38:30 There's only one green blob node.

38:32 But the cool part here is we were able to explore each of these objects in our repository and we

38:38 get a feel for how they work.

38:39 So if I change one line in a very big file, what gets committed?

38:44 Well, the entire file.

38:46 Yeah.

38:47 And I'd suspect that's probably why large binary files are not ideal to be committed here, even

38:52 though you technically can put them there.

38:54 Right.

38:54 So that's the first group of things is these objects.

38:57 So that's the top level objects folder in the .git folder.

39:01 Yeah.

39:01 Yes, exactly.

39:02 Now, there is a pack folder inside there.

39:05 If you run various commands, then Git will say, well, do I have too many commits, too many

39:13 of these objects that I need to pack together to make this repository smaller on disk?

39:20 And if so, then it'll automatically do a GC, a garbage collect, where it starts to pack those

39:27 into pack files.

39:28 Now, it's kind of a Zlib compressed group of Zlib compressed files.

39:32 So it gets very meta there.

39:34 But that's what the pack folder is inside the objects folder.

39:37 Yeah.

39:37 Okay.

39:38 So next up, let's talk about the refs folder.

39:40 Now, when we look at refs, we look at branches and tags and remotes.

39:47 These are files that reference commits.

39:51 So one example is the head folder in the root of the .git directory.

39:56 And inside that head folder, it will specify what head points to.

40:01 So if you do a Git log and you see that head has an arrow pointing to, I don't know, main

40:06 or trunk or develop or whatever, then if you open up that head file, you'll see the text

40:15 in that file is that file.

40:17 It's basically the SHA, right?

40:19 Is that what it is?

40:19 It is the SHA if your head is pointing at a SHA.

40:22 But typically, your head won't be pointing at a SHA.

40:25 It'll be pointing at a branch.

40:27 Oh, yeah.

40:27 Refs.

40:28 Mine right now is refs slash heads slash main, which is the default branch for this project.

40:33 So that's awesome.

40:34 Main being the branch.

40:35 Yeah.

40:36 Head says it goes to refs, heads, main.

40:38 So we can go into the refs folder.

40:40 We can go into the heads folder.

40:42 And we can open up main.

40:43 And what's in main is the SHA of the commit that main points to.

40:49 Okay.

40:49 What's cool here is that each of these refs, both head and all of these branches, is just

40:56 pointers to the commits in the objects folder.

41:00 Yeah.

41:00 So these are like the main file is just a text file that just literally has only the SHA that

41:06 is where that branch currently is.

41:08 Exactly.

41:09 Okay.

41:09 So technically, to create a branch, I just create a file that happens to be in refs, heads.

41:16 I name it something and I give it a SHA.

41:18 And now I have a branch that points at that thing.

41:22 Branches in Git are not these durable, fragile things like in TFS or in Subversion.

41:28 Branches in Git are just name tags.

41:31 They're pointers.

41:32 They're references to the commits in this tree of objects.

41:37 So the cool thing is we can move them around.

41:39 Right.

41:39 They're basically a path of these named commits through the history of the overall history

41:44 of it, right?

41:44 Right.

41:45 They're the labels that we give it so that we can understand it because communicating in

41:49 32-digit SHAs is not as much fun.

41:52 No, definitely not.

41:54 Definitely not.

41:55 One of the talks that I like to do is I do a Git log and I show that 32-digit hash and I

42:00 read it out.

42:01 And then I walk up to somebody in the audience and pretend they're the project manager.

42:05 And I go, can I ship it?

42:07 And they're like, yeah.

42:11 Thus, we have these labels.

42:13 Yeah, that's right.

42:14 In the heads folder is all of the branches.

42:18 In the tags folder is all of the tags.

42:23 And there are also just files pointing at commits.

42:26 Sorry, my repo is empty.

42:27 I don't have any.

42:28 But if you, you know, people might tag a release or a version or a beta version or something like

42:32 that.

42:32 So you can refer to it by name, by label instead of, you know, main with the SHA or

42:38 something weird like that.

42:39 Right.

42:39 Right.

42:39 And then we would have a remotes folder, which references where I last saw another copy of

42:48 this Git repository's branches.

42:50 Yep.

42:50 So in this case, you have one that says refs, remotes, origin, main, and that's perfect.

42:55 That's where I last saw this server, this server's main branch.

43:03 Now, in this case, I chose to call my main, my remote server origin.

43:07 Now, this could be a server that we've designated as the server.

43:12 It could be one of my coworkers.

43:15 It could be a network share.

43:17 You know, Git isn't really opinionated about what constitutes a remote repository other than

43:23 that it is in this one.

43:24 Yeah.

43:24 Okay.

43:25 How does it know what origin is?

43:27 As I create a remote, I'm going to name it.

43:30 Okay.

43:30 So as I clone, I'm going to say, Git clone this repository and it'll build one and it'll

43:36 by default call it origin.

43:37 But I could also say, Git remote add origin.

43:43 I just gave it a name and then give it a URL.

43:45 I could say, Git remote add upstream.

43:48 I could say, Git remote add Michael.

43:51 Now it's a reference from my repository to yours.

43:54 And so it's just in this case, in the refs, remotes folder, it's just a folder

43:59 referencing the branches that I saw on your machine.

44:02 Nice.

44:02 Is there somewhere where it stores like the URL?

44:04 It does.

44:05 And that is the next section that we may want to look at, which is configuration.

44:09 Let's open up the config file in the root of the .git folder.

44:14 All right.

44:15 Now this configuration file is really cool.

44:17 It includes all kinds of configuration details associated with our repository.

44:22 Now in this case, we have remote origin where we've named this one and here's the URLs that

44:28 we go to there.

44:28 In this case, it's github.com/talk Python.

44:31 We have other configuration details associated with this repository.

44:37 This .git config file is actually one of three on my machine.

44:44 And we'll start out with our config file that's installed when we install Git.

44:51 So it's probably in program files or it's in user local bin or somewhere off in the ether

44:59 of how we install it.

45:00 We probably don't want to touch that one.

45:02 But that's the base configuration of all the options that we chose when we installed Git.

45:07 So if I run a command, if I were to say something like git email global, something like that,

45:14 you know, the .g command, maybe it's modifying that one.

45:16 Well, the one that we just talked about was the system one.

45:19 The second one is the global one, which is user specific.

45:22 I find that name a little confusing.

45:24 But my user specific, the .git config in my user home directory.

45:30 So, you know, see users rob or user or the tilde slash directory Mac and Linux.

45:38 That .git config overrides any settings in my system configuration.

45:44 And so oftentimes when you first install Git, you'll say git config dash global user.email user.name.

45:52 And so if you open up that .git config in your user home directory, you'll see those settings.

45:58 You'll see your username, your name, your email, and all of the details that you've configured there.

46:06 And then the third one is the config file here in your repository that will override any of those settings.

46:13 So it doesn't make sense for us to have origin in our system, in our user specific configuration file,

46:21 because, well, each repository will have a different origin.

46:25 But it probably does make sense to put our name and email in our system, in our user specific directory,

46:31 because that would apply to all the repositories on our machine.

46:35 Yeah, absolutely.

46:36 Almost all of them.

46:37 You might be doing home-based open source work, and you might be doing corporate button-up work,

46:43 and your formal corporate place might not love your corporate email on the open source project.

46:49 Exactly.

46:50 Or maybe you don't have it.

46:51 Exactly.

46:51 Yeah.

46:51 So when I have that scenario where I need to set my email address, maybe my name differently in different repositories,

46:59 I can set it in my .git config in my user home directory, and then I can override it in each repository.

47:07 Just copy those couple of lines, set them in your config file here, and now you've set this repository to track your email differently.

47:16 Is there a git command to change it so I don't actually go into the .git folder,

47:21 and I say, like, git email but not global, or git config email?

47:24 Yeah.

47:25 Leave off the --global.

47:26 Yeah.

47:26 Okay.

47:27 Perfect.

47:27 Then you don't even have to know how.

47:30 You just know I do git email and what my email is.

47:32 Right.

47:33 Now, there are other configuration files here in the .git folder, but the config file is really the big one that we like to talk about.

47:41 Yeah.

47:41 Okay.

47:42 But you'll see a description file here.

47:44 That's a configuration file.

47:46 Git InstaWeb is a web server baked into Git that allows you to kind of browse through your repository.

47:55 Now, Git InstaWeb works pretty well on Linux and not so great on Windows.

48:00 I bet you've never used InstaWeb in most scenarios.

48:04 I'd never heard of it until you brought it up the other day.

48:07 Yeah, but this configuration file is the name of the website when you launch Git InstaWeb.

48:13 Yeah, so Git ships with a web server that can be the host of that Git repository.

48:17 Yeah.

48:17 Now, why would I ever do that?

48:19 Why wouldn't I just use GitHub?

48:20 Exactly, which is why you've never heard of Git InstaWeb.

48:23 Yeah.

48:24 I mean, you might say we want a private Git server or a public Git server

48:28 or something like that.

48:29 That might be.

48:30 But yeah, usually, yeah, I've never heard of it.

48:32 So very interesting.

48:33 All right.

48:34 What else is in this list here?

48:36 Yeah.

48:36 Yeah.

48:37 So we've talked about the content in the objects folder.

48:40 We've talked about the branches and tags in the rest folder.

48:43 We've talked about configuration.

48:45 Let's go poke in the hooks folder.

48:47 Yeah.

48:47 Hooks is interesting.

48:48 Yeah.

48:49 It is really cool.

48:50 The hooks folder is where we do automation.

48:53 Yeah.

48:53 So people probably heard of pre-commit hooks, right?

48:57 Like probably the most popular example in the Python space is to run the black formatter.

49:02 So it automatically formats your code before it checks it in.

49:05 So indentation, white space, like between a comma and an argument or something.

49:11 It's always consistent.

49:13 So you don't get these like back and forth editor driven, you know, merge issues.

49:17 There's no real change.

49:18 But I format it in my editor.

49:20 You format it in yours.

49:22 And back and forth, it goes between spaces with a comma and no spaces with a comma,

49:25 spaces with a comma.

49:26 And so you could set up a pre-commit hook to canonicalize it before it goes in.

49:32 But there's more than pre-commit, right?

49:33 Yeah.

49:33 I could set up a pre-commit hook to make sure all my unit tests pass before I commit.

49:38 I could set up.

49:40 And so what we see here in this hooks directory is all different kinds of automation things.

49:47 So a pre-commit hook, a pre-merge hook, a pre-push hook, a pre-rebase hook.

49:53 And each of these are shell scripts.

49:56 Well, with one exception, it's a Perl script.

49:58 But you see at the very top, it says slash bin slash sh.

50:02 Well, I'm on a Windows box.

50:04 Is this shell script still going to work?

50:06 Well, yeah.

50:07 Get ships with enough Linux-y, Unix-y, Bash-like stuff to be able to kick off these shell scripts

50:14 and run them as it would on any Linux system.

50:17 Okay.

50:17 Interesting.

50:17 So there's basically like a little mini Bash that comes with it.

50:21 I remember people using that Bash shell from Git.

50:23 It'd be more Unix-like on Windows.

50:25 Exactly.

50:26 So here in this shell script, I could do all kinds of things.

50:29 Maybe I'm calling a PowerShell script.

50:31 Maybe I'm calling a Python script.

50:33 Maybe I'm calling a Node formatter.

50:35 I can just call into whatever tasks I want to accomplish.

50:38 And that will then accomplish that task whenever this event happens.

50:44 So what I love to do in my demo is remove all the dot sample pieces so that they're actual scripts.

50:50 And then just merely the presence of that file will be able to kick off that automation.

50:55 All right.

50:56 So there's a bunch of files that are sample shell scripts named things like pre-commit.sample or pre-merge commit sample.

51:03 If I just called it pre-commit but not the dot sample, now it's going to be active?

51:08 Exactly.

51:08 Okay.

51:09 Nice.

51:10 Now the cool part about these is that I have all my automation set up.

51:13 I'm running the formatters.

51:15 I've got my unit test passing and it's great.

51:18 But this file is inside my dot git folder.

51:21 So I can't commit these.

51:23 It's not one of the files that is available for me to add to the staging area.

51:27 Right.

51:27 It would be Inception if you tried to commit stuff in the dot git folder.

51:32 Right.

51:33 So often we'll create shell scripts outside the dot git folder and commit them and then have something here inside the dot git folder that calls into that other shell script.

51:44 Yeah.

51:45 And you mentioned some kind of node-based tool that you can use, right, that will manage that stuff, right?

51:50 Right.

51:50 There's lots of packages.

51:51 The one that I show is git hooks that is an npm package.

51:56 And once you install git hooks, it will actually create all those aliases from the folder where you actually build the scripts that you can commit into this hooks directory so that then they'll run.

52:08 Just installing this package installs those hooks into place.

52:11 I see.

52:11 So basically, if you just install the package once, it will find those other external scripts and make those be the ones that git sees with the advantage that you can commit them into GitHub.

52:24 And if somebody makes a change, that change will propagate to everyone else.

52:26 Yes.

52:27 You can commit them into git, push them up to GitHub, and they will run.

52:31 Okay.

52:31 Yeah.

52:32 Fantastic.

52:32 Yeah.

52:32 Yeah.

52:33 Very neat.

52:33 Very neat.

52:33 Okay.

52:34 What else have we got here?

52:35 What else have we got?

52:36 I think maybe index, maybe?

52:38 That's an interesting one.

52:39 Yeah.

52:39 Index is really interesting.

52:41 As we look through index, if we just pop it open in an editor, it's just a bunch of gobbledygook.

52:48 And we're like, what is this?

52:49 It's a file, right?

52:50 Yeah.

52:50 Yeah.

52:51 This isn't the only index, but this is one of the really cool indexes where git keeps track of interesting stuff.

52:58 That's nice.

52:59 Yeah.

52:59 Check out this blown up.

53:00 If I try to look at it, it's like a binary blob exploded and died on my terminal.

53:06 But there are file names in there somewhere, so it must be something to do with that.

53:09 Yeah.

53:09 I think it's git ls-files, where you can go look through this index.

53:14 And if we pass in flags to that, then it'll be able to show the status of those files.

53:19 But this is looking through that index.

53:22 And the cool part about looking through that index is that git, if it wants to do a quick thing,

53:29 like which files have changed, needs to know the blob that is checked out in my working directory.

53:36 You know, which blob did I start with?

53:39 Right.

53:39 As we look through those objects, we saw a big tree of things.

53:43 And so opening up each commit node, finding all the tree nodes, opening up each tree node,

53:48 finding all the blob nodes, that takes a while.

53:51 And so this is a cache, an index, of all the files that I checked out in my working directory.

53:58 This allows git to move really fast as it looks through my folder and identifies any files that have changed or new files or things like that.

54:08 So that's what this index file is for.

54:10 Yeah.

54:11 And my git incantations are not pulling it up here, but I think you can get it to show the shah of each file as well, right?

54:17 Right.

54:17 In which case, then instead of traversing the whole history and actually looking at the file on the hard drive and saying,

54:24 well, what is its hash?

54:26 Do I have an update for this file?

54:28 I could just look in this binary file and get that answer, right?

54:30 Exactly.

54:31 Nice.

54:32 Yeah.

54:32 The next section of files that we want to look at are logs.

54:36 And the cool thing about git's logs is they keep track of where all of our branches have been.

54:45 So if we cat.git slash logs slash head, then we get a thing that kind of looks really weird.

54:52 We've got really long lines in this.

54:54 And in our first line, it says a whole bunch of zeros.

54:58 And then we've got the git shah of the commit that it went to and a little bit about that commit.

55:04 This is a log of where our branches have been.

55:10 And so we'll have a file for each of our branches.

55:13 In this case, we're looking at the head file.

55:16 So we see that head started out nowhere and ended up at ed13fc.

55:20 And then it has my username, my email, and then some other stuff.

55:24 Yeah.

55:24 The really interesting thing is this log can be really useful if, for example, I switch branches

55:32 and forgot where I was.

55:33 Or I commit something and then I uncommit it.

55:37 That's a thing.

55:38 And I want to get back to it.

55:39 Or I delete a branch before I merged it in.

55:42 Or, you know, those types of things.

55:44 If I do that quickly enough, you know, remember git's going to do that garbage collect and go prune nodes that aren't used anymore.

55:51 If I get there quickly enough, I can use this log to go back through my refs and go find that commit.

56:00 The objects are still there.

56:02 I just don't have any refs pointing to them anymore.

56:04 And so the command that we can use on the command line is called git ref log.

56:08 And we can pass git ref log a particular branch we want to look at.

56:12 But by default, if we just say git ref log, all one word, then it will show the history of head.

56:19 Now, in this case, we didn't move it very far.

56:22 But we can see there, oh, and here's the branch that I just deleted.

56:26 And here's the SHA for this one.

56:29 And so at that point, then we can git checkout that commit and get back to the content that we had created and then lost the reference to.

56:38 Right. Okay. Nice.

56:39 There's a little bit of recovery, kind of an undelete if you had to in there.

56:43 Yeah.

56:44 Nice.

56:44 The funny thing about this, the command is git ref log.

56:47 But I've also heard it pronounced git ref log.

56:50 And I'm like, so I've got this cat of nine tails.

56:53 And I'm like, no, you can't.

56:54 Git ref log.

56:56 Exactly.

56:56 Do it again.

56:57 But once you understand how the refs folder works, then git ref log makes a whole lot of sense.

57:03 We're looking at what those ref files have said in the past.

57:08 Here's what it was before we changed it.

57:10 Here's what it became after we changed it.

57:12 And a little bit more context around it.

57:14 Where you're currently working is where the head is pointing.

57:17 Often that's some point in a branch.

57:18 And this is like, where's the history of that bin throughout the branch that it's on?

57:24 Yes.

57:24 Yeah.

57:25 Very cool.

57:25 Very cool.

57:26 So we're getting sort of short on time here.

57:28 What else should we be talking about?

57:31 What else should we close this out with in terms of content of our .git file?

57:37 The only other section in here is temp files.

57:41 So if we've committed stuff, we might see a commit underscore MSG file.

57:47 Or maybe it's called commit underscore message.

57:50 We might see other temp files.

57:52 We have a temp folder sometimes baked into things.

57:56 And so that's the last group of files here in the .git folder is temp files.

58:01 Temp files, configuration, objects, refs, hooks.

58:06 These are all the pieces that come together to make this git database.

58:10 And once again, you really can't break git.

58:13 You know, it's like, well, I did this incantation and it's broken.

58:16 Well, no.

58:17 You can use ref log to get back to a particular commit.

58:20 Or you can use various commands, checkout to get back to where you need to.

58:25 Maybe you'll use reset to, you know, kind of get your working directory back in shape.

58:30 But that structure of git, the double entry bookkeeping inside this repository, is really

58:37 good at keeping track of the things.

58:39 And so you really can't break git.

58:42 Yeah.

58:42 And back this up.

58:44 You back it up, right?

58:45 You back up this folder.

58:46 You back up basically everything, right?

58:48 All right.

58:49 Now, it might be easier to back it up, not by just backing up this folder, but by publishing

58:53 your changes to another repository.

58:55 And that's where we have great workflows.

58:57 Like, I will push all of these changes to another server.

59:01 Maybe I'll call that server origin.

59:03 Yeah, absolutely.

59:03 And that is automatic if you check out from somewhere like, clone it from somewhere like

59:07 GitHub.

59:07 Right.

59:08 GitHub.

59:08 So there's just a couple other things maybe I want to touch on really quickly while we

59:12 have a moment.

59:13 When you talked about breaking git, there's an interesting little design thing called dangit,

59:19 git, or even better.

59:20 Maybe I'll link to the better version, the not safe for work version where you're frustrated

59:24 and it's like, oh no, I just did something terribly wrong.

59:27 Please tell me how to do it.

59:28 And reflog is right at the top of these things.

59:31 I committed and immediately realized I need to make a change or I need to change my commit

59:36 message.

59:37 And yeah, anyway, that's a pretty interesting one.

59:39 Another thing we've talked a lot about GitHub.

59:41 And what we haven't really talked about is gitignore, right?

59:46 As much as you want to track stuff, you don't want to automatically track a bunch of things

59:49 that are working files, you know, build stuff from C++ or maybe node underscore modules

59:56 or high charm working files or all sorts of things should not go into your project, right?

01:00:03 Your vn directory.

01:00:04 Yeah.

01:00:05 Yes, exactly.

01:00:06 Your vn directory.

01:00:07 Absolutely.

01:00:07 So there's gitignores.

01:00:10 Any content that you download, any content that you compile, any of that content shouldn't

01:00:16 be in your repository because it changes too infrequently and it's usually easier to either

01:00:21 rebuild it or redownload it.

01:00:23 All those things should be ignored.

01:00:25 Yeah.

01:00:25 It's a huge merge nightmare as well.

01:00:28 Even if you could keep it, right?

01:00:30 Suppose I check in my vn directory and you go on Windows.

01:00:34 Well, you can't have the same contents as mine because mine is the macOS version.

01:00:38 So you change it, put your Windows version in there and I get it back out and it breaks

01:00:42 my Mac versions.

01:00:43 I got right.

01:00:43 So there's stuff you should ignore.

01:00:44 Absolutely.

01:00:45 And when you create a new project on GitHub, it very handily says, hey, what kind of project

01:00:52 is this?

01:00:52 We can get you far down the road with your gitignore.

01:00:55 Is this a Python project?

01:00:56 Is it a node project or whatever, right?

01:00:58 What I wanted to point out is that dropdown list.

01:01:02 There's actually a GitHub project called gitignore that has the ignore for all of these different

01:01:09 languages.

01:01:09 So if you want to make a change to say Python's gitignore, you can go there and pull it up

01:01:15 and see it.

01:01:16 And you could technically do a PR against it to say there's this new thing that's common

01:01:20 in the community now.

01:01:21 Please fix it.

01:01:22 That's pretty cool.

01:01:23 And these things aren't perfect.

01:01:25 You know, most of them will exclude everything that starts with or ends with or contains

01:01:30 log.

01:01:30 But your iLogger or your log handler might get excluded by that as well.

01:01:37 So you may need to adjust this to get it the way you want.

01:01:41 Yeah.

01:01:41 But it is nice to know that at least it'll give you a bit of a start and that it's a thing

01:01:46 that you can contribute back to.

01:01:47 It's not just magic inside of GitHub, but it's its own GitHub open source repository.

01:01:52 Right.

01:01:52 Yeah.

01:01:52 Quite neat.

01:01:53 Quite neat.

01:01:54 Let's see.

01:01:55 What else should we cover really quick?

01:01:57 I think maybe just one other thing I think that's maybe worth throwing out there that was

01:02:01 interesting, but it's pretty specific.

01:02:03 But you've mentioned Windows a couple of times.

01:02:05 Maybe two things, actually.

01:02:07 One is on the shell that you saw on my screen just a minute ago, when I was inside of a git

01:02:12 repository, it would actually put what branch it was on and the git state and so on.

01:02:18 And I have that because I have OhMyZShell installed.

01:02:22 Which is a really nice shell for Mac and Linux that gives you things like branch awareness

01:02:28 and number of changes and so on.

01:02:30 I saw your talk.

01:02:31 You had something like that for PowerShell, the new Microsoft terminal.

01:02:36 What were you using for that one?

01:02:36 It's called OhMyPosh.

01:02:38 And Scott Hanselman has a really cool video about OhMyPosh where he walks us through how

01:02:44 to get it installed.

01:02:45 There are various themes into OhMyPosh, but the theme that I really enjoy actually puts the

01:02:50 cursor on the next line.

01:02:52 One of the things that I frequently do in command prompt is I have all of the path to get to

01:02:57 this folder.

01:02:57 And so the command that I'm trying to teach ends up getting wrapped to the next line.

01:03:01 And so OhMyPosh or OhMyZSH gives you that additional context of how's your git repository

01:03:09 doing?

01:03:09 You could also show your remote.

01:03:11 It's basically just running a shell script behind the scenes.

01:03:15 And so you can modify that shell script.

01:03:17 Scott Hanselman is diabetic and so needs to check his blood sugar a lot.

01:03:21 And so he actually has built into his OhMyPosh script, his blood sugar number, because it's

01:03:28 really easy to miss.

01:03:28 And it's one of those things that's really important not to miss.

01:03:31 So it's in his terminal all the time.

01:03:33 Probably even color code it, right?

01:03:35 If it's out of range, make it red.

01:03:37 If it's not out of range, you make it green, something like that.

01:03:39 Yes.

01:03:40 Wow.

01:03:41 How interesting.

01:03:42 Yeah, this looks fantastic.

01:03:43 I've never played with this before, but yeah, it looks really nice.

01:03:47 You recommend it?

01:03:48 Yeah, I do.

01:03:49 Cool, cool.

01:03:49 All right.

01:03:51 Well, I guess the one other thing that I was going to throw out there is I heard of this

01:03:55 thing called VSF for Git.

01:03:57 We talked about large files and this sounds like it's very much a Windows only thing, but

01:04:04 it's a neat idea.

01:04:05 This virtual file system for Git that if you have a really large repository, it's kind of

01:04:10 like the smart sync for Dropbox or something.

01:04:13 It only pulls the files and interacts with the files that you actually touch, but it does

01:04:18 that behind the scenes without you knowing it.

01:04:20 Have you seen this?

01:04:21 Yeah.

01:04:21 And we actually said VSF for Git, but it's actually VFS for Git, virtual file system.

01:04:28 It's great when your repository is just massively huge.

01:04:32 And 98% of our repositories are not.

01:04:35 But when you have the code base of, I don't know, Windows, then you need something like

01:04:41 this because you can't Git clone the entire thing.

01:04:43 GitHub, not GitHub.

01:04:45 Google is famous for having their corporate monorepo.

01:04:49 And I suspect that's bigger than you could Git clone onto each machine as well.

01:04:54 And so the cool part is one of the benefits of subversion that we lost as we moved to Git

01:04:59 was I could clone only part of a repository.

01:05:03 And VFS kind of gives us that ability back.

01:05:07 Most of the time we don't need it.

01:05:09 But if you've been really bad and you've committed a whole bunch of binary files to your repository,

01:05:14 it's interesting.

01:05:16 It might be worth kicking the tires.

01:05:17 It isn't necessarily Windows only.

01:05:19 It is plug-in to Git itself.

01:05:21 But it allows you to put that checkout directory somewhere else.

01:05:26 So, for example, on a shared network drive.

01:05:30 Now I have all of those objects, all of those blobs in one place, and I don't need to copy each of those to my machine.

01:05:38 Yeah.

01:05:38 Interesting.

01:05:39 The Windows people that were switching to Git said it was really a nightmare.

01:05:44 So, for example, the source code for Linux repo is something like 600 megs or 0.6 gigs.

01:05:50 Windows is like 270 gigs.

01:05:52 So it's really ginormous.

01:05:54 And they said to do a clone took 12 hours.

01:05:57 To do a checkout took three hours.

01:05:59 To do a Git status took eight minutes.

01:06:01 And to do an add and commit took 30 minutes before they made this change.

01:06:05 So they were suffering some hard pains to go down this path for sure.

01:06:10 I guess it probably is worth it for them.

01:06:12 Right.

01:06:12 All right.

01:06:13 Well, I guess we probably should put a bow on it.

01:06:15 We're more or less out of time there, Rob.

01:06:17 But I'll ask you the two questions I always ask at the end of the show.

01:06:20 If you're going to write some code, what editor do you use?

01:06:23 It depends on the code that I'm trying to write.

01:06:26 In most cases, I'll reach for VS Code.

01:06:28 But I'll also reach for Visual Studio.

01:06:31 Right.

01:06:31 If you're going to be doing ASP.NET stuff, like you said.

01:06:33 Sometimes I'm also known to reach for...

01:06:36 If you're doing like ASP.NET or something you were talking about like that or something you could...

01:06:40 Maybe something like WPF where the tools are built in.

01:06:43 You have to basically...

01:06:44 Not have to.

01:06:44 Almost have to use them.

01:06:45 But sometimes I also reach for Sublime Text or TextEdit.

01:06:48 Cool.

01:06:49 Cool.

01:06:49 And then often ask for a Python package library recommendation.

01:06:53 Maybe we could make it your Git script.

01:06:57 The one that runs the pre-commit stuff.

01:06:59 The one that moves that outside the Git file.

01:07:01 What was that called again?

01:07:02 It's called Git Hooks.

01:07:03 Git Hooks.

01:07:04 And let me grab a link to it.

01:07:06 It's actually a Node package, but exactly.

01:07:08 Yeah.

01:07:08 You just install it wherever.

01:07:09 And it's good to go, right?

01:07:11 Yes.

01:07:11 And so if you have maybe a Flask server and you want to...

01:07:16 As part of your Flask server, maybe you have a React or a Vue app or you need to pull down

01:07:21 jQuery as part of your client-side dependencies, then you may have enough Node stuff to be able

01:07:27 to leverage this as well.

01:07:29 Yeah.

01:07:29 Yeah.

01:07:29 That makes a lot of sense.

01:07:30 If you're already using NPM because you're doing front-end stuff, then you might as well,

01:07:35 right?

01:07:35 Yes.

01:07:35 Yeah.

01:07:36 Very cool.

01:07:36 One of the things that we didn't talk about, and it's really cool how this happened,

01:07:39 Git workflows.

01:07:40 What's beautiful about Git is it's really unopinionated about how you do your workflow.

01:07:46 Are you going to do GitFlow?

01:07:47 Are you going to do GitHub Flow?

01:07:49 Are you going to do something else?

01:07:51 Git can work for all of those scenarios because it is just a mechanism of committing and sharing

01:07:56 files.

01:07:57 It doesn't impose a specific branching or naming convention.

01:08:02 You can choose to put those on top, but Git's workflow is really open to whatever you need

01:08:09 it to do.

01:08:09 Yeah.

01:08:09 Well, when I was first getting familiar with this whole PRs and merging and those kinds

01:08:15 of things, I felt like, oh, that's a Git thing.

01:08:17 That's a GitHub thing.

01:08:18 It has nothing to do with Git, right?

01:08:20 It just Git facilitates that on top of it.

01:08:22 So you can choose however you want to work, right?

01:08:24 Right.

01:08:24 Quite cool.

01:08:25 All right.

01:08:25 Well, I don't normally close out this show with a joke, but Robinson had a good one here

01:08:31 in the live stream.

01:08:31 So I'm going to put this up here for us as our parting thought.

01:08:34 And then I'll ask you for one more as well, maybe.

01:08:36 Yeah.

01:08:36 So he said, there's a programmer who once told him, couldn't use Git.

01:08:41 He was afraid to commit.

01:08:43 He was afraid of the Git commitment.

01:08:47 Oh, that's awesome.

01:08:49 Yeah.

01:08:49 Yeah.

01:08:50 Yeah.

01:08:50 Thank you for that.

01:08:50 Thanks for making us laugh.

01:08:51 All right.

01:08:52 Find a call to action.

01:08:52 People want to go a little bit deeper than Git.

01:08:54 Maybe they just do the three commands, Git clone, I don't know, Git add, Git commit,

01:08:59 Git push.

01:09:00 Like that's four commands.

01:09:01 Like beyond that, like how do you get more into this world?

01:09:04 What's really interesting is as we're coming off of those other systems, we want to kind

01:09:09 of build up that tribal knowledge that we had.

01:09:12 And so we're going to go grab those three or five commands and we're going to stick them

01:09:15 to the post-it under our keyboard.

01:09:17 Take the next step to go figure out, you know, what is the next command that I want to do?

01:09:23 Or how does this command work?

01:09:25 What we did today was we explored through that .git folder so that we could take that next level

01:09:31 to see how it works.

01:09:32 Git isn't a black box.

01:09:34 It's not magic.

01:09:35 It just works a little bit differently than the source control system you might have been

01:09:39 familiar with.

01:09:40 So definitely get familiar with it.

01:09:42 Google the terms that you're looking for and really start to embrace that mechanism and

01:09:48 get really powerful with Git.

01:09:49 I'm confident that you can get past just those few commands and you can make it just an inherent

01:09:55 process in your workflow and use it to be really, really powerful.

01:09:59 Specifically, separating the save points from the publish points.

01:10:03 That's the thing you couldn't do before that you can now do with Git.

01:10:06 Yeah, well said.

01:10:07 Definitely agree with all of that.

01:10:09 I think getting really good with source control and source control these days really means Git

01:10:14 almost.

01:10:14 It allows you to be fearless with your code, right?

01:10:18 So often people are like, oh, I would like to try this, but what if I break it?

01:10:21 What if it doesn't go right?

01:10:22 Well, if you know how to create your branches, work locally, do all sorts of stuff, roll back,

01:10:28 you can just go crazy and just explore things.

01:10:31 And if it doesn't work, throw it away.

01:10:32 No harm, no foul.

01:10:33 It's lovely.

01:10:34 And if you get really stuck, hit me up on Twitter at Rob underscore Rich and show me the code where

01:10:40 you got stuck and let's get you unstuck because I would love to continue this conversation and

01:10:45 really help you be successful.

01:10:46 All right.

01:10:47 Well, thank you for taking the time and being here.

01:10:49 It's been great to chat, Git with you.

01:10:51 Most definitely.

01:10:52 Thanks for having me on.

01:10:53 Yeah.

01:10:53 See you later.

01:10:54 This has been another episode of Talk Python to Me.

01:10:58 Our guest on this episode was Rob Richardson.

01:11:00 It's been brought to you by our courses over at Talk Python Training.

01:11:04 Want to level up your Python?

01:11:06 We have one of the largest catalogs of Python video courses over at Talk Python.

01:11:10 Our content ranges from true beginners to deeply advanced topics like memory and async.

01:11:15 And best of all, there's not a subscription in sight.

01:11:18 Check it out for yourself at training.talkpython.fm.

01:11:21 Be sure to subscribe to the show.

01:11:22 Open your favorite podcast app and search for Python.

01:11:25 We should be right at the top.

01:11:27 You can also find the iTunes feed at /itunes, the Google Play feed at /play and the direct

01:11:33 RSS feed at /rss on talkpython.fm.

01:11:37 We're live streaming most of our recordings these days.

01:11:39 If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talkpython.fm/youtube.

01:11:47 This is your host, Michael Kennedy.

01:11:49 Thanks so much for listening.

01:11:50 I really appreciate it.

01:11:52 Now get out there and write some Python code.

01:11:53 I'll see you next time.

01:12:14 Bye.

01:12:14 Bye.

01:12:14 Bye.

01:12:14 Bye.

01:12:14 Bye.

01:12:14 Thank you.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon