How does `git rebase` skip the commit which its change already has in upstream?

git-rebase document says:

If the upstream branch already contains a change you have made (e.g., because you mailed a patch which was applied upstream), then that commit will be skipped.

But how does Git do that?

Assume any commit X is the parent of commit Y, and diffXY is the result of git diff X Y command. And I have following commits:

o---o---o        <- master
 \
  o---o---o---o  <- test <- HEAD

If I do a git rebase master. Then I guess Git does the skipping commit already have in master by skipping any commit Y in test that diffXY already have in master.

I've ran some examples and they were like what I guessed.

This is just my guess, am I right?

Plus, does Git do this skipping task before Git do the reapplying test's commits onto the master?

The first versions of git rebase (1.4.4, Oct. 2006) were using git format --ignore-if-in-upstream

This will examine all patches reachable from <since> but not from <until> and compare them with the patches being generated, and any patch that matches is ignored.

So it was looking at the patch ids: See commit 9c6efa3 for the implementation.

 if (ignore_if_in_upstream &&
    !get_patch_id(commit, &patch_id_opts, sha1) &&
     lookup_object(sha1))
     continue;

A "patch ID" is nothing but a sum of SHA-1 of the file diffs associated with a patch, with whitespace and line numbers ignored.
As such, it's "reasonably stable", but at the same time also reasonably unique, i.e., two patches that have the same "patch ID" are almost guaranteed to be the same thing.

That was later delegated to git rebase-am (Git 1.7.6, Feb. 2011)

And commit b6266dc, Git 2.1.0, Jul. 2014 used --cherry-pick instead of --ignore-if-in-upstream

When using git format-patch --ignore-if-in-upstream we are only allowed to give a single revision range.
In the next commit we will want to add an additional exclusion revision in order to handle fork points correctly, so convert git-rebase--am to use a symmetric difference with --cherry-pick --right-only.

(Further improved in Git 2.18)

That does not change the "skip identical commit" mechanism.

VonC's answer gives the history. The mechanism is what Git calls a patch ID. Git's patch ID concept is documented (albeit a bit lightly) in the git patch-id manual page, summarizing it this way:

... you can use this thing to look for likely duplicate commits.

This is what git rev-list --cherry-mark (with the symmetric difference ... notation) and git format-patch --ignore-if-in-upstream (with a simple exclusion .. operation) do to detect duplicate commits. If a commit, whose hash is by definition different from the commit to—at least potentially—be copied, has the same patch ID as the commit to be copied, Git assumes that the commit is already copied and therefore there is no need to copy it.

You also asked:

Plus, does Git do this skipping task before Git do the reapplying test's commits onto the master?

Yes: the list of commits to be copied is generated first—during which the patch-ID-equivalent commits are discarded, along with all merge commits unless you are using the -p or -r options—and then the rebase process begins.

(If you use a non-automated git rebase that uses git am, the rebase process still uses git format-patch output as input to git am. Otherwise the commit hashes to be copied are stored in a file, or in the sequencer which may or may not store them in a file, and then the commits are cherry-picked, either by running git cherry-pick or directly by the sequencer. The details depend on your particular Git vintage.)

来源：https://stackoverflow.com/questions/52789519/how-does-git-rebase-skip-the-commit-which-its-change-already-has-in-upstream

标签

git

git-rebase