blog
git repository cleanup using rebase
Recently I had a few git repositories to
clean up. Over time, of lot of issues had evolved:
There were a few hundred commits spread over the last fifteen years,
from which a lot were auto-migrated from an ancient
cvs
repository so they didn't conform in any way to git's commit message
recommendations, I did commits using at least five different – partially
invalid – e-mail addresses, there were some empty commits resulting
from the cvs2git
-conversion and only about half of the
commits were
signed.
I resisted cleaning up these repositories for a long time since it's quite a lot of work. The trunk along with all relevant branches had to be rebased, in my case a lot of commit messages needed to be rewritten so that at least the second line was empty and all the tags and submodule references had to be restored afterwards.
I was using a rebase command from
101b147ch's reddit post, which kept
all commit dates intact and replaced the author's name and
e-mail with the current settings of user.name
and
user.email
:
git -c rebase.instructionFormat='%s%nexec GIT_COMMITTER_DATE="%aD"'\ ' git commit -S --amend --no-edit --reset-author --allow-empty --date="%aD"' \ rebase -i --root
This starts an interactive rebase. I had to remove the
--reset-author
manually from all commits which weren't mine,
but since there were just a handful this
wasn't too hard. All commits which should have their commit messages
adapted had to have their pick
prefix replaced by
edit
before saving the file. After that, it was a lot
of git commit --amend
and git rebase --continue
to get all the old cvs messages adjusted.
The -S
parameter means that all the rebased commits are
getting signed. Since we're rebasing, all the already existing
commit signatures are being lost anyway, which, in my case, isn't bad
at all: New signatures were created automatically during rebasing
and I don't care whether the signature's date matches the commit's date,
since the signatures are only used to
prove the authenticity of the commit in question.
After the trunk's rebase was finished I wanted to clean up the most recent branches, too. Since the first step only resulted in a new set of commits without any branches attached, I first pushed the new, rebased commits into a fresh repository. Then, in the new repository, I created a new branch from the head of the main branch and cherry-picked all commits from the original branch into the target branch:
cd new_repository git checkout -b target_branch git cherry-pick 87ah231..aee2105
Once the commits were imported into the target branch in the new
repository, I could start rebasing the target branch. The command
was almost the same as for the trunk, just instead of starting from
--root
rebasing started at the newest common commit
of tunk and the target branch up to the target branch head:
git -c rebase.instructionFormat='%s%nexec GIT_COMMITTER_DATE="%aD"'\ ' git commit -S --amend --no-edit --reset-author --allow-empty --date="%aD"' \ rebase -i bb70841
When I compared the results of my rebase work to the original commits I noticed I had less commits than before. That's because by default intaractive rebasing will skip merge commits, as the documentation states:
By default, a rebase will simply drop merge commits from the todo list, and put the rebased commits into a single, linear branch.
This behavior can be changed using the --rebase-merges
parameter:
With --rebase-merges
, the rebase will instead try to preserve
the branching structure within the commits that are to be rebased, by
recreating the merge commits. Any resolved merge conflicts or
manual amendments in these merge commits will have to be
resolved/re-applied manually.