How to find commit responsible by adding a file index (blob)

问题

When we make a git diff Version1..Version2 -- file, this command will return something like :

diff --git a/wp-includes/version.php b/wp-includes/version.php index 5d034bb9d8..617021e8d9 100644

The git here compare between two version of a file to give you the difference between them. I need to know the commit responsible for adding the file in question from the number of index 5d034bb9d8, and the index **617021e8d9*.

回答1:

TL;DR

This (untested) script may do what you want. Read the rest for how it works, if and when it works, and caveats.

#! /bin/sh
case $# in
2);;
*) echo "usage: script left-specifier right-specifier" 1>&2; exit 1;;
esac
# turn arguments into hashes, then ensure they are commits
L=$(git rev-parse "$1") || exit
R=$(git rev-parse "$2") || exit
L=$(git rev-parse $L^{commit}) || exit
R=$(git rev-parse $R^{commit}) || exit

haveblob=$(git rev-parse $L:wp-includes/version.php) || exit
wantblob=$(git rev-parse $R:wp-includes/version.php) || exit
git rev-list --reverse --topo-order $R ^$L^@ | while read hash; do
    thisblob=$(git rev-parse $hash:wp-includes/version.php)
    test $thisblob = $haveblob && continue
    if [ $thisblob = $wantblob ]; then
        echo "target file appears in commit $hash"
        exit 0 # we've found it - succeed and quit
    fi
    echo "note: commit $hash contains a different version than either end"
done
echo "error: got to the bottom of the loop"
exit 1

Long

Let's clarify this a bit more: you've run:

$ git diff <commit1> <commit2> -- wp-includes/version.php

and its output reads, in part:

index 5d034bb9d8..617021e8d9 100644

Let's call <commit1>—which you specified by hash or tag or branch name or whatever—L, where L stands for left side of git diff. Let's call the second commit R, for the right side.

You want to find some commit that comes at or after L, and before or at R, where file wp-includes/version.php matches the version in R, i.e., the one whose abbreviated hash is 617021e8d9. But you don't want just any commit: you want the first such commit—the one closest to L.

It's worth noting, first, that there may be no sensible relationship at all between the two commits. That is, if we were to draw a graph of the commit history, it might be simple:

...--o--o--L--M--N--...--Q--R--o--o--o   <-- branch

But it might not be so simple. For the moment, let's assume that it is simple.

The simple case: L is `L` and R is `R` and there's a straight line of commits in between

In this case, there's some direct causal relationship in getting from L to R. The answer to your question will make a lot of sense. Specifically, it answers the question: where did this version come from? There's a direct line of commits starting at L and ending at R and the version that's in R might be in an earlier commit too. Let's see how to find the earliest commit, in the L-to-R sequence, that has the same version that's in R.

First, note that each commit represents a complete snapshot of all the files that are in that snapshot. That is, if we look at commit N above, it has all the files, in some form or another. The copy of wp-includes/version.php in N might match the one in L or might match the one in R. (It clearly cannot match both: if it did, the one in L would match the one in R and there would be no index line and no diff output.)

It's possible that the file is in L and R but is not in any of the commits in between, but in that case, the answer is: The file first appears in R.

It's also possible that the file is in L and R and in some, but not all, of the intermediate commits: say L has it, then it's removed in M, then it appears again in N in the form it has in R, then it's removed again in O, and so on. So it's present in L, N, P, and R; it's missing in M, O, and Q. Now the question is more difficult: do you want to see it in N, even though it's gone again in O? Or do you want to see it only in R since it's missing in Q?

In any case, what we need to do is enumerate all the commits in the range L through R. So we'll start with:

git rev-list L..R

(which will omit L, which is kind of annoying). Git will enumerate these in a reverse-ish order; since we know the chain is linear, this is in fact straight reverse order. (We'll see how to enforce a sensible order for more complex cases later.) To check L itself as well, we can just add it explicitly:

(git rev-list L..R; git rev-parse L)

or we can use the rather complicated trick of:

lhash=$(git rev-parse L); git rev-list R ^${lhash}^@

(for details see the gitrevisions documentation). The simpler:

git rev-list L^..R

usually works as well: it fails only when L is a root commit.

In any case, the output of git rev-list is a bunch of commit hash IDs: the hash ID of commit R, then that of commit Q, then that of commit P, and so on, all the way back to L. So we'll pipe the output of this git rev-list through commands to figure out where our particular blob came from. But we want to visit the commits in the other order: L first, then M, then N, all the way up to R. So we add --reverse to the git rev-list arguments.

The rest of this assumes we're writing this script in sh or bash or similar. Before we run git rev-list, let's get the full blob-hash of each version of the file. Then we'll have them in the loop:

#! /bin/sh
case $# in
2);;
*) echo "usage: script left-specifier right-specifier" 1>&2; exit 1;;
esac
# turn arguments into hashes, then ensure they are commits
L=$(git rev-parse "$1") || exit
R=$(git rev-parse "$2") || exit
L=$(git rev-parse $L^{commit}) || exit
R=$(git rev-parse $R^{commit}) || exit

# get the blob hashes, exit if they don't exist
haveblob=$(git rev-parse $L:wp-includes/version.php) || exit
wantblob=$(git rev-parse $R:wp-includes/version.php) || exit
git rev-list --reverse $R ^$L^@ | while read hash; do
    ...
done

Inside the loop, let's get the blob hash for this commit:

    thisblob=$(git rev-parse $hash:wp-includes/version.php)

If this fails, that means the file is removed. We can choose to ignore that and skip this commit, by adding || continue, or stop with || break, or we can simply ignore the possibility entirely on the assumption that the file will exist in each commit. Since the last is the simplest, I will do that here.

If this hash matches $haveblob, it's not very interesting. If it matches $wantblob, it's very interesting. If it's something else entirely, well, let's call that out. So the remainder of the loop is:

    test $thisblob = $haveblob && continue
    if [ $thisblob = $wantblob ]; then
        echo "target file appears in commit $hash"
        exit 0 # we've found it - succeed and quit
    fi
    echo "note: commit $hash contains a different version than either end"

and that's the script in the top section (well, mostly).

More complex cases introduce more caveats

The graph could be rather branch-y internally; R could even be a merge commit:

       M-----N
      /       \
...--L         R   <-- branch
      \       /
       O--P--Q

or come after one:

       M--N
      /    \
...--L      Q--R   <-- branch
      \    /
       O--P

Or, the graph could be such that L and R are wildly different:

...--o--o--o--L--o--o   <-- branch1
      \
       o--...--o--R--o   <-- branch2

or (if there are multiple root commits) they could even be completely unrelated, graph-wise:

A--B--L   <-- br1

C--D--R   <-- br2

Or, they might be related, whether or not it's a simple linear relationship, but backwards:

...--o--R--E--F--G--L--o--...--o   <-- branch

If the two commits are backwards like this, you should simply swap them. (The script could do this: git merge-base --is-ancestor A B tests whether commit A is an ancestor of commit B.)

If they're not directly related, the L..R syntax will exclude commits reachable from L while listing commits reachable from R. If they're completely unrelated, commits reachable from R are unreachable from L, so this is just "all commits in the history up to R". In either case, you may or may not find an answer, and it may or may not make any sense.

You can test for these cases with git merge-base above: if neither is an ancestor of the other, they may be related through a common third ancestor—the actual merge base of the two commits—or they may be completely unrelated.

If there are branches "between" L and R so that there is a merge at or before R, the traversal may occur in some difficult-to-predict order. To force Git to enumerate the commits in a topologically-sorted order, I use --topo-order in the actual script. This forces Git to traverse each "leg" of a merge one at a time. That's not necessarily critical here, but it makes reasoning about the script's output easier.

来源：https://stackoverflow.com/questions/49098004/how-to-find-commit-responsible-by-adding-a-file-index-blob

标签

git