Finding where source has branched from git

后端 未结 6 1030
天命终不由人
天命终不由人 2021-01-05 05:22

I have a git repository (covering more or less project history) and separate sources (just a tarball with few files) which have forked some time ago (actually somewhere in 2

6条回答
  •  無奈伤痛
    2021-01-05 05:58

    In the general case, you'd actually have to examine every single commit, because you have no way of knowing if you might have a huge diff in one, small diff the next, then another huge diff, then a medium diff...

    Your best bet is probably going to be to limit yourself to specific files. If you consider just a single file, it should not take long to iterate through all the versions of that file (use git rev-list to get a list, so you don't have to test every commit). For each commit which modified the file, you can check the size of the diff, and fairly quickly find a minimum. Do this for a handful of files, hopefully they'll agree!

    The best way to set yourself up for the diffing is to make a temporary commit by simply copying in your tarball, so you can have a branch called tarball to compare against. That way, you could do this:

    git rev-list path/to/file | while read hash; do echo -n "$hash "; git diff --numstat tarball $hash path/to/file; done
    

    to get a nice list of all the commits with their diff sizes (the first three columns will be SHA1, number of lines added, and number of lines removed). Then you could just pipe it on into awk '{print $1,$2+$3}' | sort -n -k 2, and you'd have a sorted list of commits and their diff sizes!

    If you can't limit yourself to a small handful of files to test, I might be tempted to hand-implement something similar to git-bisect - just try to narrow your way down to a small diff, making the assumption that in all likelihood, commits near to your best case will also have smaller diffs, and commits far from it will have larger diffs. (Somewhere between Newton's method and a full on binary/grid search, probably?)

    Edit: Another possibility, suggested in Douglas' answer, if you think that some files might be identical to those in some commit, is to hash them using git-hash-object, and then see what commits in your history have that blob. There's a question with some excellent answers about how to do that. If you do this with a handful of files - preferably ones which have changed frequently - you might be able to narrow down the target commit pretty quickly.

提交回复
热议问题