christoph ender's

blog

sunday the 28th of july, 2024

git repository cleanup using rebase

Recently I had a few git repositories to clean up. Over time, of lot of issues had evolved: There were a few hundred commits spread over the last fifteen years, from which a lot were auto-migrated from an ancient cvs repository so they didn't conform in any way to git's commit message recommendations, I did commits using at least five different – partially invalid – e-mail addresses, there were some empty commits resulting from the cvs2git-conversion and only about half of the commits were signed.

I resisted cleaning up these repositories for a long time since it's quite a lot of work. The trunk along with all relevant branches had to be rebased, in my case a lot of commit messages needed to be rewritten so that at least the second line was empty and all the tags and submodule references had to be restored afterwards.

migrating trunks

I was using a rebase command from 101b147ch's reddit post, which kept all commit dates intact and replaced the author's name and e-mail with the current settings of user.name and user.email:

git -c rebase.instructionFormat='%s%nexec GIT_COMMITTER_DATE="%aD"'\
' git commit -S --amend --no-edit --reset-author --allow-empty --date="%aD"' \
 rebase -i --root

This starts an interactive rebase. I had to remove the --reset-author manually from all commits which weren't mine, but since there were just a handful this wasn't too hard. All commits which should have their commit messages adapted had to have their pick prefix replaced by edit before saving the file. After that, it was a lot of git commit --amend and git rebase --continue to get all the old cvs messages adjusted.

The -S parameter means that all the rebased commits are getting signed. Since we're rebasing, all the already existing commit signatures are being lost anyway, which, in my case, isn't bad at all: New signatures were created automatically during rebasing and I don't care whether the signature's date matches the commit's date, since the signatures are only used to prove the authenticity of the commit in question.

migrating branches

After the trunk's rebase was finished I wanted to clean up the most recent branches, too. Since the first step only resulted in a new set of commits without any branches attached, I first pushed the new, rebased commits into a fresh repository. Then, in the new repository, I created a new branch from the head of the main branch and cherry-picked all commits from the original branch into the target branch:

cd new_repository
git checkout -b target_branch
git cherry-pick 87ah231..aee2105

Once the commits were imported into the target branch in the new repository, I could start rebasing the target branch. The command was almost the same as for the trunk, just instead of starting from --root rebasing started at the newest common commit of tunk and the target branch up to the target branch head:

git -c rebase.instructionFormat='%s%nexec GIT_COMMITTER_DATE="%aD"'\
' git commit -S --amend --no-edit --reset-author --allow-empty --date="%aD"' \
 rebase -i bb70841
migrating merge commits

When I compared the results of my rebase work to the original commits I noticed I had less commits than before. That's because by default intaractive rebasing will skip merge commits, as the documentation states:

By default, a rebase will simply drop merge commits from the todo list, and put the rebased commits into a single, linear branch.

This behavior can be changed using the --rebase-merges parameter:

With --rebase-merges, the rebase will instead try to preserve the branching structure within the commits that are to be rebased, by recreating the merge commits. Any resolved merge conflicts or manual amendments in these merge commits will have to be resolved/re-applied manually.