I am trying squash many commits into a single one, the problem is that I need do that by author (name or email).
The case:
Lets say I have a branch called fe
Be careful rewriting history
The end result you want might be possible if you create branches for each author, cherry-pick the commits from each author into the right branch, then squash those changes. However, I don't think that will work if these commits meaningfully depend on each other.
If you have a series of commits:
Author1 Author2 Author1
version1 ---commit---> version2 ---commit---> version3 ---commit--->...
If you were to try to extract the changes from Author2, and apply them to version1, there's a good chance it won't make sense (For example, if Author2 modifies code that Author1 created).
With Kenkron's caveats in mind, you could do a:
SORTED_GIT_LOGS=$(git log --pretty="format:%an %H" master..feature_a | sort -g | cut -d' ' -f2); \
IFS=$(echo -en "\n\b"); for LOG in $SORTED_GIT_LOGS; do \
git cherry-pick $LOG; \
done | less
The git log --pretty="format:%an %H" master..feature_a | sort -g
would sort the logs of the feature_a
commits (not the ones from master
because of the master..feature_a
syntax)
You would still need to do an interactive rebase to squash the (now ordered by author) commits on master
.
I needed to do a similar rewrite on an unnecessarily large repository while the repo was offline. The approach I took was trying automated 'interactive' rebase using GIT_SEQUENCE_EDITOR
which is covered in this answer by @james-foucar & @pfalcon.
For this to work well, I found it better to first remove the merges from the section of the history being rewritten. For my own case, this was done using lots of git rebase --onto
which is covered amply in other questions on StackOverflow.
I created a small script generate-similiar-commit-squashes.sh to generate the pick
& squash
commands so that consecutive similar commits would be squashed. I used author-date-and-shortlog to match similar commits, but you only need author (my gist has a comment about how to make it match only on author).
$ generate-similiar-commit-squashes.sh > /tmp/git-rebase-todo-list
The output looks like
...
pick aaff1c556004539a54a7a33ce2fb859af0c4238c foo@example.com-2015-01-01-Update-head.html
squash aa190ea2323ece42f1cd212041bf61b94d751d5c foo@example.com-2015-01-01-Update-head.html
pick aab8c98981a8d824d2bc0d5278d59bc1a22cc7b0 foo2@example.com-2015-01-28-Update-_config.yml
The repository was also full of self-reverts with the same style 'Update xyz' commit messages. When squashed, they resulted in empty commits.
The commits I was merging had identical commit messages. git rebase -i
offers a revised commit message with all squashed commit messages appended, which would have been repetitive. To address that, I used a small perl script from this answer to remove duplicate lines from the commit message offered by git rebase
. It is better in a file, as it will be used in a shell variable.
$ echo 'print if ! $x{$_}++' > /tmp/strip-seen-lines.pl
Now for the final step:
$ GIT_EDITOR='perl -i -n -f /tmp/strip-seen-lines.pl ' \
GIT_SEQUENCE_EDITOR='cat /tmp/git-rebase-todo-list >' \
git rebase --keep-empty -i $(git rev-list --max-parents=0 HEAD)
Despite using --keep-empty
, git
complained a few times through this process about empty commits. It would dump me out to the console with an incomplete git rebase
. To skip the empty commit and resume processing, the following two commands were needed (rather frequently in my case).
$ git reset HEAD^
$ GIT_EDITOR='perl -i -n -f /tmp/strip-seen-lines.pl ' git rebase --continue
Again despite --keep-empty
, I found I had no empty commits in the final git history, so the resets above had removed them all. I assume something is wrong with my git, version 2.14.1 . Processing ~10000 commits like this took just over 10 minutes on a crappy laptop.