Episode 40
Git
February 28th, 2024
47 mins 5 secs
About this Episode
The podcast begins with Mike sharing his fondness for smoothies and drawing an analogy to Git, a distributed database. He compares the irreversible process of making a smoothie to the irreversible nature of operations in Git. Mike explains that Git, emerging from the same project that gave us Linux, is a distributed database that allows information sharing and verification across geographic regions. He highlights Git's significance in facilitating collaborative development in large-scale projects like Linux.
The discussion then shifts to the technical aspects of Git, such as dealing with conflicts, the concept of a distributed repository, and the uniqueness of Git in managing changes and time. Mike elaborates on the cryptographic operation called hashing used in Git, emphasizing that changes in Git represent patches or differences rather than versions of code. The panel discusses Git's ability to handle vast numbers of contributors and the significance of timestamps and commit hashes in tracking changes.
Finally, the conversation delves into the practicalities and best practices of using Git. The team discusses branching strategies, the importance of not rebasing shared branches, and the impact of different deployment strategies on Git usage. They also touch on the challenges of merge conflicts and the importance of understanding Git's limitations and strengths. The episode concludes with an acknowledgment of Git's ubiquity in various fields beyond software development and encourages listeners to familiarize themselves with Git for collaborative projects.
Transcript:
MIKE: Hello, and welcome to another episode of the Acima Development podcast. I'm Mike. I'm going to be hosting today. With us, we've got Eddy, JP, and Kyle. Great to have you all here today.
And, as usual, I'd like to start with a bit of a story. And I'm actually going to talk about my breakfast this morning. I have become a great lover of smoothies [laughs]. Years ago...I enjoy them. I've always liked a smoothie, but they never really were enough. I have a lot of appetite. I was a huge eater as a teen. I was late getting my height, so I grew a foot in a year [laughs]. That left me skinny and hungry [laughs], and I also ran long distance. So, huge appetite in my teens. You know, I still don't have the same appetite as I used to have, but, you know, I still like to eat enough. And something like a smoothie for breakfast was never enough.
Well, I kind of by accident stumbled on a combination that scratches the itch [laughs]. I put a lot of seeds in it, and that gives the fat and protein it needs to keep things good, and then lots of whole fruit. I don't even throw juice in anymore. So, I just use water to add enough liquid. That means that I am getting all kinds of fiber, and I get the fat and protein from the seeds. It means that I stay really stable with my blood sugar. It's fantastic. I love it.
So, I throw together all these ingredients in a blender, you know, I add the water and blend it. One thing that I have never tried to do is put the fruit back together afterward. And there's a reason for that, and it is that it's effectively impossible. I mean, technically, you could pull apart all the molecules, right? And reconstitute them. But there is no way in a meaningful amount of human time. And I think it would all have rotted by that point anyway, right [laughs]?
There are some things that only go in one direction, and a smoothie is one of those [chuckles]. You press that button to get it going, and that's the end of that fruit. It still tastes good [chuckles]. It's still valuable. It's still valuable. But it only goes in one direction. That actually applies to several aspects of what we're going to be talking about today regarding Git.
Git is a distributed database, and if you're not familiar with, you're thinking, well, why would I need a distributed database? It's like listening to your friend, who is a huge Bitcoin enthusiast. We all need distributed databases. Unless you're an enthusiast, probably your eyes glaze over [laughs], and you forget about it until the next time your friend comes over. But it's a similar sort of thing. Most of these cryptocurrencies are built as a distributed database so that you can have information shared across geographic regions and be able to verify that data is not corrupted. That's what Git is. There's a few reasons that it is what it is.
Now, Git was spawned by the same project that gave us Linux, the operating system that powers, like, most of the internet [chuckles]. If you've got an Android phone, it's running Linux. If you've got some cars, I think Tesla runs Linux, you know, Linux runs kind of all the things—most of the servers, not all of them. You have some Windows servers out there, some old Unix servers, but a lot of the [inaudible 03:19] are on Linux.
But a long time ago, it was just some guy, Linus Torvalds, who wrote a project where he had an operating system in college, and he released it open source, released his code to the internet. And people actually started using it, and they started contributing to it. And pretty soon, it was this worldwide project where you have thousands, potentially, of people who are all contributing to this project. And sometimes they're going to make changes in the same files. And there were systems for managing that sort of thing in the past, but they all had a few problems.
First, dealing with conflicts was a pain [laughs]. And there's a few ways to deal with that, but you need to recognize that conflicts are a pain. And they are especially a pain, and to go back to my smoothie, they are especially a pain if you have to have some sort of shared state. If you say this is the perfect state of the repository, this is how my code is and that everybody has to work with that version of the code, well, if you have 1,000 people working on the same thing at once, it's pretty much impossible to actually have a shared vision of what that thing is. So, that is a serious problem.
If you have a whole bunch of people trying to make the smoothie at once, it's even worse than it happening in one place. Well, Linus Torvalds had an idea to change from the way things were currently working, where he said, you know what? We're not going to have a central version. There is no such thing as a, you know, single, canonical version of what our code is. You can all code separately, and it's fine. And we're just going to let time move forward. And what the pure, true version is, is just whoever the person in charge says it is, who has their copy of the code.
And here's the other place where the smoothie comes in. This idea of time only moving forward matters. There's an operation, a cryptographic operation, called a hash where you take some data, and you can do some mangling on it, and you get a number out the other end. And you can't go back the other way. There's a few different ways of doing it. There's a popular algorithm, the SHA set of algorithms, that is used to do this hashing. And it's really all it says, you know, hash means to scramble something up. It's basically what I'm doing with my smoothie. You can do it with vegetables [chuckles]. Once you've shredded something up, you can't go back the other direction.
So, this is the other key thing that Git does is that anytime somebody makes a change, it makes a hash of that change. That is, it runs this mathematical operation, takes all your changes, and comes out with just a number on the other end, a long number. And that number is pretty much going to be unique. The chances of it being the same as somebody else's numbers that they hashed is essentially zero.
So, you can have people all over the world making their changes and then creating these hashes, and the hashes are all going to be different. Together with that hash, you have a timestamp. And each of those hashes is associated with a change, and the change is not a version of the code. And here's another key thing to note: the changes represent a change, that is, a difference. They represent a patch or a this is what we're going to do to change the code.
So, what you end up with is this database distributed across any machines. You can have thousands of people. They could be working for months separately, come back, and they have this list, you know, this set of changes. Each of these changes has a commit hash associated with a unique number you can identify with and a timestamp. And each of these also represents a set of changes that's going to be applied. As long as you can apply those changes, it really doesn't matter if you don't have a central version, just [inaudible 06:47] the changes.
And you can imagine sometimes it doesn't work. And that's what we're going to get to a little bit later today is dealing with some of those times when it doesn't work. But the fact that the time only moves forward, that is, you always have to apply changes; you're not going back to some central vision of what the code is. It means that everybody can work separately, and it's just fine.
What you decide to deploy here with Linux is whoever is in charge of the project says, "Hey, this is the version that we're going to go with." And you think, well, how do you know? Well, you know, you probably have a human problem there [chuckles]. That is, we have human structures to deal with governance, so use those human structures. Let the people decide and designate somebody who is going to say, "Okay, this is the version that we're going to go with." And if you have that person, she or he can say, "You know what? This is good. I give it my stamp. We can deploy this."
And then you can have people working separately, and they can contribute to it on their own time by removing that need for a central figure and by giving things, you know, these unique handles so that, you know, everybody can verify that the change is unique. And by allowing time to only go forward—you can only make changes—then you can solve a lot of the problems that we had because there were systems before. Did any of you use some of the systems we used before Git, and how did that go for you?
EDDY: Well, honestly, for myself, I'm pretty new [laughs] to development. So, the only version control that I know is Git. I'm no stranger to previous systems, you know, that were used in the past. Like, out of curiosity, I did look up on, like, what Git was, and I looked at the timeline and what other features they were. I read about, like, RCS and SVN, et cetera, which pale in comparison to what Git is.
MIKE: Kyle, I think you mentioned you used CVS back in the day.
KYLE: Yeah, I was quite junior. I don't remember a lot about it. The one that I would know the most about, which was just that it was always down, was, like, our SVN when I was at a C# shop specifically. It was great when it was working, but it seemed to stop the ship anytime there was a major error. It seemed to stop the ship pretty dead center when there was a conflict, which was always just a hassle to get through.
MIKE: And you hit on something. It was really hard back in the day to try to manage this using the tools that were there. But those tools were a big step forward from what you had before because if you don't have some sort of tool for [laughs] managing, then you just got your code on your machine, and you email it to somebody, right? And then, they try to figure out the differences and hope they get it right. That's not a great way to do things. Really, that's what you had before we had some of these tools.
So, while it may sound like we're casting some shade on Subversion or CVS, that doesn't mean that those tools were not incredibly useful and a huge step forward from what we had before. But once you have to have a central, a central repository to maintain, then you have a single point of failure. And a single point of failure is not so fun to deal with.
JP: I guess also want to call out as well that Git is not the last iteration as well. A lot of people will talk about Git. We have all these beautiful tools that have emerged in the technology industry to work on top of Git. I mean, the foundation is still Git and a lot of these other tools like Subversion and CVS, but really, Git just laid a foundation, like you said, with distributing that data and using, you know, the mathematical computation of hashes to be able to empower this extremely large distributed network of code and make it publicly accessible, which was something that really wasn't there before.
EDDY: Wait, Jackson, you said that it's not the end all be all. So, are you saying that there's other version controls currently that are newer --
JP: Not other version controls, but systems leveraging Git. So, I know at Acima and at a lot of [laughs] organizations, the world depends on GitHub, for example. GitHub itself is a version control system that relies on Git but adds a ton of additional features on top of that system.
MIKE: Right. It's another layer on top of it. When you think about a database, releasing a database to the world is awesome, but not very many people are going to say, "Hey, I'm going to go hit PostgreSQL today." But they might say [laughs], "I'm going to go visit Facebook," right? [laughs] And that layer on top matters with GitHub, or GitLab, or Bitbucket, or, you know, there's any number of different versions out there. I'm not trying to say that one is the best. They add another layer on top, so you actually have an application that runs on top of that distributed database down beneath.
JP: The big call out kind of, Eddy, to respond to that, is, like, what does GitHub give that Git doesn't by itself? Right off the bat, the biggest feature that I guarantee that most people would agree on is pull requests or merge requests, the concept of that. That is not something that comes with Git by itself.
EDDY: But that is available with other version controls, right? So, I'm sure GitLab does something similar to that.
JP: Well, absolutely. GitHub, GitLab all of those more just systems leveraging Git provide that functionality. But Git, by itself, if you're just hosting your own Git server, it doesn't come with that functionality.
MIKE: And Git is the database.
JP: Yes.
MIKE: Git is the database, and the database is not going to give you all the social interaction tools that these applications provide, this idea of a pull request or a merge request with reviewers, what all that you have in there, your descriptions, all of your workflows.
JP: Status checks, yep.
MIKE: Yeah, that isn't provided by the database. The database stores the changes to the code, but it doesn't manage any of that social interaction, and that's by design. You know, it's a tool that's designed to be the database, not to manage that social interaction. That's an important thing to recognize.
EDDY: That's interesting. I never thought of Git as a database. But putting into perspective, I think it actually makes sense in hindsight.
JP: And I guess kind of to, like, layer on top of that, Mike, if you don't mind, we have Git as the database. But there are a lot of other...like, we talked about GitHub, GitLab that leverage that database. But the day-to-day, again, even when developers are working with Git on their systems, they're not working with the database directly. They work with the Git CLI. And there's other alternatives to that as well. So, a lot of people think that Git CLI is...that's it. That's the thing doing my version control. No, it's that combined with the database technology underneath that really gives you that full functionality of Git.
MIKE: That's a really good point. We interact a lot with that command line interface, right? You know, all of the command line tool that Git provides for us or maybe use a graphical tool to interact with that.
JP: Yeah [laughs]. Yes.
MIKE: And a lot of people do. But that itself isn't the database [chuckles]. Having those tools for writing the database are important. You [chuckles] don't want to corrupt your database, and then you've got an even bigger problem [laughs]. It's probably been done, but I strongly doubt that very many people have ever written their own data to that database by hand. They use an automated tool to do it.
EDDY: I actually have a question. And, again, seven months full-time development, so I'm still pretty new to the concept. Git, since it's a database, does it store all the repositories for a vast majority of the companies that use --
JP: Oh, good question, though. Good question.
MIKE: Let me answer that question with a question. Do the databases you use store data from Facebook, and why or why not? So, for your application, say you're using PostgreSQL, do you and your PostgreSQL database store data from another company?
EDDY: Probably not.
MIKE: Probably not. But they may also be using PostgreSQL, right?
EDDY: Right.
MIKE: So, PostgreSQL is a technology that can be used to create a repository for storing data, but it isn't itself of that repository. It's a technology for creating a repository. So, there are, you know, millions of entities out there that have a repository, but that doesn't mean they all have the same one in Git or in any other database. There are many repositories out there, many of which have nothing to do with each other at all.
And thinking about this being distributed, we can all have separate repositories, and that's fine. But many of us have repositories that we connect to each other so that we can have some sense of shared direction and code and data, maybe, I should say. Because it doesn't necessarily have to be code. I've used Git for things other than pure code.
JP: [laughs] And I guess to kind of follow up on that as well to kind of answer it, just using GitHub as an example, and I guess most major version control hosted solutions, each project or set of projects that you're working on usually gets written in its own Git database, its own...what they call a Git repository. That is the terminology for it. So, a lot of organizations, or bigger companies, or even just large projects will have many Git repositories that are all tracked in separate Git repository databases, not in the same one together.
EDDY: So, where does the code get saved then when you push up? --
JP: Oooh, another good question. In that case and, Mike, please, yeah, please bounce off me. Using a provider like GitHub, they're actually hosting a Git server. That's another piece of the Git technology stack, Git repository server that will receive the requests of the commit hashes that you're changing and the associated data, and they will store it on their side.
MIKE: Remember when I said earlier that you designate somebody to be the person to say, "Hey, my repository is the one that matters?" We've elected to let GitHub...I say we, you know, anybody who chooses to use GitHub generally elects to use GitHub as that gatekeeper. And we've chosen their copy of the repository to be the one that's in charge, but everybody else has a full copy. Anybody who works with that data has the full set of data.
GitHub is not special or GitLab, you know, you name it. We'll use GitHub today. They're not special. They could go down, and the world would be okay [chuckles]. You wouldn't because [laughter] [inaudible 16:48]. But we wouldn't lose everything because everybody else who's ever used that code would also have all of that data. You're pushing data. You're just syncing your data to that other repository, but you still have everything else. You're just pushing up a few little changes. Really, you're just pushing up those few commits, those few little chunks of changes with their hashes, and not everything. You're not pushing up the whole repository. You're just pushing up the changes that you made to your own branch.
We didn't mention branches before, but it does allow you to have...it has this idea of a branch in Git. So, you can say, "Oh, you know what? I'm going to make a copy over here." And it's really lightweight. You're not actually copying over all the code because the beauty of this distributed database is you're just making a set of changes. So, naming a set of changes is no big deal. It's just, what, one name [chuckles]? The overhead for that is extremely small.
So, it's really easy to create a new set of changes [inaudible 17:43] go off in your own direction, and you can throw it away. Or you can bring it back here and say, "Hey, you know, I'm going to bring these changes in," which actually brings us to a point that we said we'd talk about is what does it mean to bring changes in? What does it mean to merge branches? And there's a couple of different schools of thought on how to do this.
Let's say you've got a branch of code. It's just the history, right? It's just a series of changes. You can call them patches, commits. Here's the change that's going to be applied. Here's the change that's going to be applied next. Here's the change that's going to be applied next, from zero up to infinity, right? It's just an endless set of changes that are applied. And if you have two people who've been working separately for a while, you can merge those. And as long as you haven't been changing the same files, then it'll work, right? They'll be interleaved.
You'll have one person's changes, and that person made another change. Then the other person made a couple, and then the first person made a couple, and the other person made one, and the other person made one. You'll see it coming in from one person and the other person and one person and the other person. If you look at the history, you'll just see that history of when they made those changes. Each of those changes represents a change to what happened before.
And if you're thinking about reading the history, if you want to just know when they made those changes, well, that's a reasonable way to think about it. But I will say that that's not usually what you care about. What we usually think about is in terms of something more like a feature. And the way that people usually use Git is they have a branch where they build out a feature, and then they want that feature to come in. And if you wanted to go through the history and see how that feature was built, you'd have to go through and see all the little changes interleaved with everybody else's changes.
So, there's another school of thought that says when you're merging, you shouldn't actually do this [inaudible 19:18] what we call a merge, where you just throw all the changes in where they were made. Instead, you go through a process, first, of rebasing, and rebasing actually means more than one thing. It's a bigger tool, and, hopefully, we have time to talk a little bit more about it. Think about that word means base and rebasing means you're redoing the base [chuckles].
Since our changes are just a series of patches, I'll make this change, and this change, and this change. Usually, you're making those changes thinking about starting back whenever you branched off of the code originally. But you can change that. Git will allow you to do so and say, you know what? Instead, I'm going to branch off of where the code is now on my local version, you know, grab the changes or, you know, or even off a remote version.
I'm going to branch off of this specific commit. Remember, they're named. They're named with this big number, this hash. Say, "I'm going to branch off of this hash instead." Then you can do a couple of things. You could either squash all your changes down to one big change, and then you've only got one big change [inaudible 20:11] for your feature. Or you can just put them all in together, but they're all sequential. Then, if you go back through your history, you see a series of the sequential changes for one feature, and then a series of sequential changes for another feature, and then a series of sequential changes for another feature. And it makes it much easier to read.
So, here's the two schools of thought: the hey, just merge it in. It's the easier way. I say the easier way; it's the more simplistic way. It's kind of the default. "Hey, I'm just going to throw my stuff in there. I hope it works," way of doing things. And then there's the rebasing approach, which says, you know what? I'm going to try to be very polite and put my changes all in one nice, little tight bundle so that people can read it later. It does make the history a lot easier to read. But it also means you got to get everybody else on board. And it also means that you're rewriting the history because you're changing your local branch to be starting from somewhere else.
And if you have anybody else collaborating on that local branch, you're going to do them a world of hurt when you do that if they try to pull those changes down because it's going to say, "This doesn't match my local version. The history is rewritten." What happened in the past? I don't know anymore. The world makes no sense. And [chuckles] they're going to have to decide what the true history is. There's some downside to collaborating within a single branch when you follow that approach if you're ever going to make changes after you've done a rebase.
JP: I guess kind of to, like, to step backwards as well, like, simplify what all that really comes down to in Git terminology because these words get thrown around a lot, is you either have a merge commit, which that happens during the Git Merge process, and that's that default mode we talked about a second ago, just merging in the history. And your changes get globbed on top of each other, not maintaining the actual linear history of changes.
Or there's rewriting the history, which is actually moving your commit hashes around and potentially resolving conflicts, which is maybe something we can talk about as well, in order to create a new linear history that more matches the actual timestamp of when those changes happened, even if they were on separate branches and time.
There are definitely, like Mike mentioned, benefits and drawbacks to both approaches. And, you know, it can be a very opinionated thing. But it really depends on, do you care about how your code was put together? And do you want to see the changes in a clear, fashionable way that can be audited according to certain standards? So, hopefully, that's another good way to reference that.
EDDY: Let me ask you this, then, when you roll back, where do you reach out to? Do you use Git as a tool to roll back, or do you use the [inaudible 22:37] to roll back?
JP: That's a really complicated question because that really depends on what rollback means. Yes, that can be part of what they call the GitOps process, you know, doing events based on changes to a Git database. But that's going to depend on organization from organization [chuckles]. It's not necessarily related to Git. It can be, and a lot of organizations choose to do that. But yeah, that's a large discussion [laughs].
MIKE: Well, you hit on something really key that I tried to talk about with my smoothie [laughter] [inaudible 23:08]. Git doesn't have a rollback. It has no such concept, and that is by design. Time can only go forward. Once the smoothie is made, you can't put it back together. Git only has the idea of a revert, and a revert does not undo anything. And this is critical. This is, like, a really important thing to understand: A revert does not undo anything. It does the changes in reverse because time might have gone on, right?
Let's say that ten people have put new code into the repository that you've decided to centralize on after you made your changes. What would rolling back mean? Getting rid of all those changes? Getting rid of some of them? Doing some sort of rebase where you put other people's stuff on top of yours? It's ill-defined. It's not a well-defined problem. And Git has decided that there is no answer to that problem.
Instead, when you're undoing your changes in Git, it just applies your changes in reverse. And then, when you go to try to merge them in, they might not work. You might not be able to go backwards anymore because things have changed. And you're going to have to deal with the conflicts because, again, time only goes forward.
JP: This even occurs, let's say, if you wanted to delete history. The same concept happens in Git if it only goes forward. Yes, you removed, or you rebased, is really what the terminology is, your history of commits. There's a mark in time that a rebase occurred there, and it's by new commits. The new commits come in, and that is only going forward. You can never really return back to that location that you were at before.
EDDY: Wait, but if you're going back, right? And it does it in reverse, like, how does it have context of all the code that's been merged since?
MIKE: It doesn't. It doesn't at all. It has zero.
JP: It only has context of what files changed and what lines and where. So, it just tries to repeat the previous commit or two or how many you've chose in reverse.
MIKE: And it might conflict with those changes that come afterward. It lets that set of changes be no different than any other set of changes. It's just a set of changes going forward. That's a key distinction that makes Git special. It actually makes it easier to use. It's weird to think about because we like to think about, oh no, there's this pure, true version, and I need to go back to that version.
JP: [laughs]
MIKE: But no, there is no such thing in Git. There's only time moving forward.
JP: And it's also kind of on that same vein; a lot of people usually take a moment in time in Git and turn that into some type of artifact that's this code in this point of time, but it is not Git where you're making that decision. It's where you package it up and put it somewhere else.
KYLE: So, basically, you can...the idea of rollbacks, or whatever, could only be possible with an actual binary.
JP: A binary, package, artifact, whatever you call that, but that comes after exactly the Git process.
MIKE: But you can copy out a version, right?
JP: [laughs]
MIKE: And then later, you can go back to it. That's not Git doing so. That's a choice outside of your database, you know, outside of your version control.
EDDY: I feel like having an analogy of what Git is would really help. While we were talking, I looked up a more simplistic way to think about what Git is. The repository is like a storybook, and the pages in that book are the files. The writers are the developers. Changes and additions are the commits. Book's memory is the history. Going back in time is the version control that we were talking about. Working together is collaboration. And then sharing the book is the remote repository. Would we agree, in essence, that that's a fair way to do a comparison?
JP: Yes and no.
MIKE: Digital book? Yeah.
JP: [laughs]
KYLE: I was sitting there thinking with Eddy's comparison there; I'm thinking back to all of my college books, right? And it almost seemed like it does work pretty well there. The revisions of your college books they don't change a lot. But they might fix some wording, and they might fix a few quote, unquote, "bugs" here and there. But they don't revert back to anything. They have to release a new version, regardless of what they're doing.
MIKE: Yeah, that's true. You can only do revisions forward. Once it's out there, you can't get it back.
JP: Great point.
MIKE: One thing we haven't talked about...we've talked about this idea of rebasing. We've talked about merging. We haven't talked about what happens when it doesn't work.
JP: [laughs]
MIKE: And yes, that will happen to you today if you [laughs] are just starting using Git because it does happen. Although if you use a very responsible and different ways to go about this, you know, if you have a very careful process...and I would strongly advocate for continuous delivery. Thank you, DevOps. We've got some DevOps people here. Thank you so much for making our lives better [laughter].
If you make lots of small changes, you're less likely to have a problem. If you make a few huge changes, you're almost certainly going to conflict, and it's going to be painful. Sometimes, you're going to make a change in the same place somebody else did, and there's no way to get them in alignment automatically. That's what we call a merge conflict, and Git doesn't solve those for you. It instead gives you both versions side by side and lets you decide.
And sometimes, the answer is to pick one. Sometimes, the answer is you pick the other, and sometimes, the answer is you have to somehow make them work together [laughs] and combine the two. And this is inescapable because, yeah, you both change the same place. You're going to have to deal with that. And no, there's no magical system that can address that. So, instead, it just makes it easy. It says, here are your two versions. Go ahead.
EDDY: It's harder to resolve, or it's more common to run into when you have a larger team working on the same codebase.
MIKE: The number of people working in the same files is really the same thing. You could have two people working in the codebase working on the same file and have it happen a lot. It's the people working in the same part of the code that really gets you.
JP: I guess to call out that distinction as well, what is Git actually recording in its database? Now, without going too deep, it's usually recording line changes within a file, and that's why merge conflicts come up. It's not because multiple files change. It's going to be multiple lines changed by different people in the same file, and that's why we have to resolve that.
Take a JSON file, for example. A lot of people are familiar with that data structure. If I'm adding a new item in a list in a JSON file while you add a whole new object in that same location, you can't just merge those automatically. Those are two different data types that will break the JSON format. That won't be a valid source code file anymore to use within my code. That's why we check up on the lines that you're changing and why you have to make the decision and the distinction of which one makes sense. And sometimes, like Mike said, it's neither [laughs], and you have to fix it yourself.
EDDY: 100%. Well, and it's kind of cool that Git kind of just sits back and be like, you tell me what the right version is, and then I'll go [laughs] ahead and push it up. So, it gives the responsibility to the writer.
MIKE: And, interestingly, you're probably doing this in a branch. And you're probably using some of these tools on top of Git that will then let you do a review process. And so, other people can look at it and say, "No, that's not how it's supposed to go," or "Oh, yeah."
EDDY: So, I've heard that before, we used local databases as a way of version control. Like, is that true prior to, like, cloud-basing version control?
JP: I mean, and that's still what we use databases for [laughs], just not for source code, if you want to think about it that way. We've chosen to add an additional set of tools and technology in the Git database to manage source code and files. But we have customer data, potential [inaudible 30:47], mathematical computations, whatever you're storing in a database. It's still version there and sometimes has its own scheming.
EDDY: I got to imagine, right? Like, if you were storing that in a local database somewhere, what happens if that data gets corrupted, right?
JP: [laughs]
EDDY: So, it's like one point of failure, essentially.
JP: Absolutely.
KYLE: You work later.
EDDY: [laughs] You work later.
KYLE: You work a lot later.
JP: [laughs] And there's the whole concept as well of what Git is, right? I mean, Git is a simple database technology that runs off files on your system. You're storing something in a large database technology like we've been using Postgres as an example. You have to run a whole set of technology as well. So, let's call out Git as well. You don't have to run anything. You just have to use tools that apply the Git technology, Git database technology in order to work with it at runtime. You don't have to keep something running 24/7. That's another big benefit.
MIKE: It relies on the file system you already got, you know, and there are distributed databases that people use for customer data. Couchbase is one I've [inaudible 31:53], which is designed for cases where sometimes somebody's not going to be connected for a while. It might be good for a mobile app where, you know, somebody's going to be offline a lot [chuckles]. And they're going to be making some changes, and you want to merge those back in later. So, these kinds of ideas are not exclusive to Git and can be used for other kinds of data. But we're talking about Git, which is designed around files.
JP: I guess to also talk about–Eddy, because I really like what you mentioned; there are other stores, and one extremely popular, like, data store that isn't doing it for, like, source code but for in general for files as a whole is, like, S3 or similar blob storages. They can also version your files. They don't do it as a conglomerate like Git does, where everything that's changed in this specific directory structure is recorded. S3 can take specific files and manage their versions over time.
So, just like Mike said, there are other version control technology out there. You could store your code in S3 and version it that way. It's not going to probably be very beneficial. But some of those ideas that have come from Git and the previous VCs technologies are definitely used in a lot of other places.
KYLE: Slap Athena on top of it, and you're good to go.
[laughter]
MIKE: We've talked a lot about how Git works, what it does, this idea of merge conflicts. We haven't necessarily talked much about best practices.
EDDY: Don't force push if you're collaborating.
JP: [laughs]
MIKE: That goes back to the rebasing thing. You change history now the other person has got a divergent history. And what do they do? That's the gotcha for using a rebase approach. I actually am partial to the rebase approach, but that doesn't mean that it solves all problems. I like to have a Git history that's easy to follow [chuckles]. I think it's a wonderful thing. [chuckles] I mean, when you have to go back and try to figure out what happened in the codebase, it's great.
By the way, this is just database. You can look at the history. If you've got the Git command line, you can do Git log, and it will show you all of those commit hashes plus the description of the change all the way back through history, and you can search it. And you can pass the dash p for the patch flag, and it'll show you the differences. And you go back and just search, like, uh, this is the name of the function that I'm looking for. And you can go back and search through your history. It's just like any other file. You can go back and search through it, which is fantastic [chuckles]. It's a searchable database.
But, you know, so, I really like to be able to search for things, to be able to understand things. But yeah, if you ever are in the same branch as somebody else, if you change your history using this rebase, if you change the history and then push it up to a shared repository—you have this force push that allows you to do that—then anybody else who pulls that is going to have to deal with the consequences, which is now their local version is not in sync with the remote version. They have a different history. And they're going to have to figure out what to do.
They can throw out their local changes. Hopefully, they haven't done anything important, right? That rebase type flow really only works if you have individual contributors working in a branch, and then a shared branch you treat as something that you don't change, right? You don't ever rewrite the history on anything that's shared.
JP: You know, and you mentioned best practices. I'd also, like, include as, like, a subcategory and kind of really [inaudible 35:03] on that is also branching strategy as part of best practices. A lot of popular projects out there, especially ones that are, like, libraries for simple projects, use something called trunk-based development, which is usually a branching strategy of where you have a default branch typically named, like, master or main.
And your code, when you branch off that, and you return back to that original branch at some point in time, usually, through a pull request or a merge request against standards that you want to follow, when you're using Git, usually, you choose to go back to that branch. But there are many other strategies out there, and your team or project should help decide what branching strategy you should use.
A lot of mobile projects for a long time have used Git Flow as a branching strategy, which I won't even attempt to try to explain right now. But it is a whole nother branching strategy with a whole nother set of standards. So, you really have a lot of flexibility to be able to define these on your own, but each one comes with their own caveats.
If you're using trunk-based development where you're merging back to that master or a main branch, just as Mike mentioned, let's say I'm working directly on the main branch. You should never do this, but let's say that you are. Let's say I force-push a rebase that I had just done on the main branch up to main, and Eddy comes to put up a pull request with his changes that he had branched off main. Now he's going to have even more issues because the branch he's trying to go back to, to merge back into, he has diverged his branch's history from that branch he's going back into, which can be a nightmare.
What I'm trying to say, in a nutshell, is know what branching strategy you want and have certain rules for the certain branches that you're working with of what they are allowed and not allowed to have happen within their history.
MIKE: A simple rule, based on what you said, Eddy, is never rebase shared branches. It's pretty straightforward, but you have to come up with a way to do that. How are you going to do this? You can work on your own stuff, and you can rewrite your own history because it hasn't been shared with anybody else yet. But once you put it out into the world with other people, then you have to only go forward, right? It's past the time when it's responsible to go back and try to rewrite history and change things.
EDDY: I wanted to mention, just so I can get some more context, what's the branching strategy that we use in our team?
MIKE: We used to use something closer to Git Flow, but we have moved back to a trunk-based strategy because, again, thank you, DevOps [laughter]. We've got continuous delivery. And if you're deploying 20 times a day, you're making lots of little changes, lots of little low-risk changes. That doesn't mean no risk, right? You need to be responsible in your changes.
JP: [laughs]
MIKE: But if you're making many small, concise changes, they're unlikely to step on each other, and you can just go back into your default branch. One reason the mobile teams use something like Git Flow is that you maintain a parallel branch. You maintain a parallel branch for development and periodically come back to your default branch, to your primary branch. The reason for that is something like mobile; you can't deploy many times a day because you're working with an app store, you know, you're working with a third party that doesn't allow that kind of continuous deployment that you can get in a web application type of environment.
So, everybody has a different shared branch that you periodically merge into the one that ends up going to your app store, and that way, you still have your shared branch. You still have this idea of a shared branch similar to, you know, that main or master, but it's different. You have this development branch, and that's what you are treating as the shared special branch. And then the main one you only use for those builds.
JP: So, in a nutshell, you can say, when you're defining your standards and your branching strategy that you're using with Git, it's really influenced on how you're deploying something. Git doesn't do the deploying, but how you use Git and the database will be influenced from that deployment type. So, a mobile app, like Mike said, is your source code packaged up in some formation. But a web application or a web server that's something that's ingested and can be constantly updated on every single network call that your client is making. So, those just have very different deployment strategies.
If you even see, like, talking about, like, Slack or another tool that's built on Electron, that's even more complicated because it is a web application running as a desktop application. So, you have to do both types of deployment strategies. So, it's really cool. And you have to definitely pay attention to both sides of the coin. How do I want to develop, and how's it going out after I'm done?
MIKE: There's no one right answer.
JP: No, no [laughs].
EDDY: Yeah, the takeaway, for me, so far, has been Git is much more vast than I initially thought it was, which is really cool.
JP: Git's simplistic in nature of what it's trying to record. So, it makes it a powerful technology.
MIKE: It follows the long history of Unix tools...
JP: [laughs]
MIKE: That are really good at one very specific thing so that they can be chained and used together. Even the fact that it uses the filesystem as its storage means that it lives within an ecosystem. You can swap out the file system; no big deal, right? In fact, you would likely have different developers on different file systems, and it's all okay [chuckles]. That is not a problem at all. It's intended to do a good job within its own little niche and never step outside that. If you need to step outside that, use another tool.
KYLE: I mean, that's ignoring the carriage return, right?
JP: [laughs] Hey, they fixed that too, Kyle [laughs].
MIKE: If any of our listeners don't know, different operating systems have historically created files that used different line endings. And trying to collaborate across different operating systems where [chuckles] they have different endings of your lines means that some people are going to have a carriage return, some people are going to have a line break, some people are going to have both. And having to merge those is potentially painful. And they've had to improve that cross-operating system handling overtime, partially by using some heuristics, say, you know what? I think you're on a Mac. I think I'm going to go with, let's say, carriage returns only, [chuckles], you know, and they just make some guesses.
EDDY: If you were to make a new utility to replace Git, what would be your approach to making it more predominant?
JP: I mean, all jokes aside, my first thing would be, what am I trying to fix? Because that's how you find not only your target audience but how you build the right technology. What does Git not solve for you right now? And I guess people have actually asked that question. And there are other tools out there. That's why GitHub exists. That's why GitLab exists and adds those additional feature sets.
MIKE: There is a modern version control system called Mercurial that has a niche following, that I think has a lot of these same features. But it's partially proprietary, and people are cheap. And one thing people like about open source, not always, but often, you can get it for free, as in beer, as well as free as in speech. And they like that free stuff.
EDDY: Is Mercurial, the one you are mentioning, open source?
MIKE: My understanding is that it was not, at least times in the past. Is it today?
JP: I don't know. I'm going to find out.
MIKE: It looks like it is today [inaudible 42:12]
JP: I keep finding people saying Mercurial is free. It does not say open source but free.
KYLE: GPL V2.
JP: That would be open-source
EDDY: For the listeners, open source and free are completely different.
JP: [laughs]
EDDY: And it did take me a bit to [laughter] [inaudible 42:30] when I first started to find that distinction.
MIKE: According to Wikipedia, which could be wrong [laughs], it says that Facebook uses Mercurial, Mozilla, the W3C.
EDDY: What is Mercurial trying to solve here? Like, what's its selling point?
MIKE: Another thing on the Wikipedia page says that Bitbucket dropped Mercurial support back in 2020 because less than 1% of new projects use it. So, it's one of those things that it may be solving lots of problems, but I don't have enough familiarity because I don't use it.
JP: Eddy, I'll ask you the same question in a different way: of why do people still write in COBOL?
KYLE: Because of our banking software.
[laughter]
EDDY: Because if you have a repository that is written on COBOL and trying to [inaudible 43:18] would be a [inaudible 43:21].
JP: Yes. I would say not only is, like, there's the cheap factor of software, but there's also the familiarity in the habits of software.
EDDY: Well, familiarity isn't really always a good thing because with technology always changing every single day --
JP: Then we end up with COBOL.
[laughter]
EDDY: It took me a few months to really get familiarized with Git. And to have to adopt a new technology just because someone has an itch [laughs] is not always fun.
MIKE: You know, Git it is a tool that has met a lot of people's needs, and I think it became popular because it worked [chuckles]. That doesn't mean that there are not other tools that work and may even be a little better. But it does a really good job. And any differences are just not enough for people to care about. And momentum is hard to beat. If you've got a tool that does the job for almost everybody, then it's going to get the attention. It's going to get the fixes. Eventually, it's going to get the features.
EDDY: So, I'm assuming Git is something that's widely used predominantly in the software industry, right?
MIKE: I honestly can't think of the last time I talked to somebody who wasn't using Git [laughs].
EDDY: So yes, listeners, I would say if you're wanting to become a developer, I would think familiarizing yourself with Git [chuckles] is crucial.
MIKE: My 11-year-old was asking me about Git the other day. It's that ubiquitous [laughs]. He ran into it independently, looking into Minecraft modding, I think. But the first answer he got is, "The first thing you need to do you need to set up a Git repository." [chuckles] And like, "Oh, can you tell me how to get set up with GitHub? What's GitHub?" [laughs]
JP: And then I would also say, not even just as a developer, if you're even a data engineer and you're looking to leverage tooling and other things out there, it can be very useful to know at least some of the fundamentals of Git in order to get your way around those tools and see how they work. Documentation has also made its way, like you were kind of hinting at, Mike, into the Git world. So, learning about that if you're doing any kind of technical documentation is also a really good place to be.
KYLE: Even more, like, layman tasks. My wife ran into it a while back. She has her website blog that she uses, and she was looking up how to modify one of her themes, and she needed to get something off of a Git repo. You know, it's kind of trending towards a common tool that everybody's going to have to know about, you know, even if you're just a writer, or in social work, or something, right? You don't have to be an engineer. It's out there. There's some obligations that you're going to find in Git.
MIKE: And I wouldn't say just a writer or just a social worker.
KYLE: Well, those -- [laughter]
MIKE: We provide value in different ways [laughter]. That does not --
KYLE: Yes. Yes. You're right. You're right. My bad.
MIKE: [laughs] Git is an amazing tool. Those of us who've been around for a while use some other tools. Most of us have deeply fallen in love with Git and how well it works for us. It's one of those things that just works. Anything that just works is wonderful. Whether you're maintaining your social work documentation, or whether you're writing code, or whether you're doing something exotic, I can't think of [laughs] that it's amazing using Git.
It's a tool that generally just works. It helps me to understand what's going on, to understand what it can do and what it can't do, and to go back in time. That actually is a feature, not a bug. And it allows you to work with other people well. If you haven't used it yet, give it a try. Collaborate [laughs]. We'll talk to you next time on the Acima Development podcast.