I have a git repository (covering more or less project history) and separate sources (just a tarball with few files) which have forked some time ago (actually somewhere in 2
Import that files in the tarball into a git revision, on a separate branch or a completely new one: the position in the revision graph isn't important, we just want it available as a tree.
Now for each revision in master, just diff against that tree/revision ('imported') and just output how big the diff is. Something like:
git rev-list master | while read rev; do patchsize=$(git diff $rev imported | wc -c); echo $rev $patchsize; done
So the revision with the smallest patch size will be the "closest", by a very rough rule of thumb. (An identical revision will produce a patch size of 0, and anything else will certainly be non-zero, and the more that's changed, the bigger).