I hate squash merges
What are squash merges
When proposing changes to a software project that is hosted on a forge like GitHub, gitlab or gitea, the author of that changeset opens a pull request (in gitlab it is named “merge request”, but I'll stick to the former here). That pull request is, when it is approved by the maintainer of the project, merged. This normally happens via a click on the “Merge” button in the web interface of the forge (although it does not have to).
GitHub offers different methods when merging in pull requests. The “normal” way of merging a pull request is by creating a merge commit between the base branch (for example “master”) and the pull-request branch. This is equal to
git merge <branch> on the commandline.
Another method would be the so-called “rebase and merge” method, which rebases the pull request branch onto the target branch and merges it after that. The rationale here is that if the pull request gets rebased before it gets merged, it is “up to date” with the target branch when it is merged. There's also two variants to that method, one were a merge commit is created after the rebase and one where the target branch is just fast-forwarded (
git merge --ff-only) to the pull-request branch. I find these two methods problematic as well, but that's not what we're here for.
The third method, and the one I want to talk about here, is the “squash merge”. When a pull request is “merged” by the maintainer of a project, all commits that are in the pull-request branch are put into a single commit and all commit messages are joined together. This commit then is directly applied to the target branch. The (approximate) git command(s) for doing this would be
git checkout pr-branch
git log master..pr-branch --format="%s%n%b" > /tmp/message
git rebase master
git reset --soft master
git commit -a --file /tmp/message
git checkout master
git merge --ff-only pr-branch
Implications of squash merges
What I want to highlight here is what squash merging implies.
First of all, squash merging implies that the diff a pull-request branch introduces is put into a single commit. It does not matter whether the pull-request branch contained one commit or a hundred commits, the end-result is always one commit with one diff and one message.
That's also the second thing that a squash merge implies: There is only one message (even though crafted by simply combining multiple messages) for the whole diff the pull request introduced.
Signatures forged with GPG or some other method are destroyed in that process.
Why I hate this
You can probably already smell why I loath this. By combining the individual changes a pull request introduced, one loses so much information! Consider a pull request that took 10 commits to refactor something. Carefully crafted commit messages, why things were changed the way they were changed. Very detailed analysis in the commit message, why a certain change is needed to further refactor a piece of code somewhere else in the next commit. Maybe even performance characteristics written down in the commit message!
All this is basically lost as soon as the pull request is squashed. The end result is a huge diff with a huge message, where the individual parts of the commit message could potentially be associated with the right parts of the diff. Could be. But the effort to take apart the huge commit is just lost time and maybe a huge undertaking that is completely unnecessary if the changes wouldn't have been introduced to the “master” branch via squash merge in the first place.
One might argue that the commits are still there, in the web interface of the forge. Yes, they might be. But git is an offline tool, I should be able to see these things without having to use a browser. I should be able to tell my editor “give me the commit message for this line here, because I want to see why it is written the way it is” and my editor should then give me that information. If it opens an enormous squashed commit, I'll just rage-quit! Because now I have to review a commit that might contain thousands of lines of changes with a message where I have to search in the commit message why that one line I care about was changed.
I really am hesitating to link an example here. Mostly because blaming someone who doesn't know better does not yield anything valuable and is just destructive. But let me assure you: I've seen projects that do this and it is just ridiculous! If you come across a change that touched 2KLOC of code and has a commit message that is 500 lines of “Change”, “Fix things” and “refactor code”, you could also go back to the old SVN days where we had things like “Check-In #1234 from 2022-03-04”. We can do better than that!
How to do better
So, you might think that the above is all valid and sane. But now you want to know how things could be improved. And, to be honest, it is totally trivial!
First of all, let me shortly talk about responsibilities. Because I feel like the idea of squashing all changes in a pull request comes from the attitude “I have to clean things up before I merge” of maintainers. The idea here being that they take the pull request and squash it, so that things are “clean” on the master branch. But that premise is totally wrong. The maintainer of a project (especially in open source, but in my opinion also in “not open source”) is never responsible for cleaning up a contributors work. After all, it is a pull request. The contributor asks the maintainer to take changes. The contributor is the person that wants something to be changed in the project. Therefore it is the duty of the contributor to bring the changes into a form where the maintainer accepts them. And that obviously includes a clean commit history!
I reckon, though, that some contributors just do not care about committing their changes cleanly and with decent commit messages. In my opinion, a maintainer should just not take these patches – I certainly did reject patches because of badly written commit history. There's always the option for the maintainer to take the patches to a new branch and rewrite the commit messages. For example I once did this with nice changes that were just committed badly. It is, though, not the responsibility of the maintainer to do this.
Another option which I quite like is that a project introduces commit linting (but obviously not conventional commits of course). Commit linting can be used (for example by implementing a CI job with gitlint) to ensure that commit messages have signed-off-by lines, do not contain swearwords, have decent length and more. It is a nice and easy way of automating this and working towards decent commits.
This all does help with improving the commit messages and therefore the change history of pull requests. But of course, squash merging must be disabled/forbidden still!
In my opinion, reviewing commit messages should be part of every normal code review. The GitHub web interface does not particularly support that, because one has to click through several pages until the actual commit is viewed. That's why I like to fetch the pull requests from github (
git fetch REMOTE pull/PR_NUMBER/head) and review them commit-by-commit on my local machine (
git log $(git merge-base master FETCH_HEAD)..FETCH_HEAD).
To sum up...
To sum up, don't enable squash merging in your repository configuration! Disable it, in fact! It hurts your project more than it provides value (because it doesn't provide any value)! It is a disrespectful and destructive operation that minimizes the value your project receives via pull requests.
I, for one, am stopping to contribute to projects if they squash merge.