What's git's heuristic for assigning content modifications to file paths?

前端 未结 2 1742
我寻月下人不归
我寻月下人不归 2020-11-29 11:07

Short version:

short of poring over git\'s source code, where can I find a full description of the heuristics that git uses

2条回答
  •  北荒
    北荒 (楼主)
    2020-11-29 12:09

    As you noticed, Git performs rename detection using a heuristic, rather than being told that a rename occurred. The git mv command, in fact, simply stages an add on the new file path and a remove of the old file path. Thus, rename detection is performed by comparing the contents of added files to the previously committed contents of deleted files.

    First, candidates are collected. Any new files are possible rename targets and any deleted files are possible rename sources. In addition, rewriting changes are broken such that a file that is more than 50% different than its previous revision is both a possible rename source and a possible rename target.

    Next, identical renames are detected. If you rename a file without making any changes, then the file will hash identically. These can be detected just performing comparisons of the hash in the index without reading the file contents, so removing these from the candidate list will reduce the number of comparisons you need to perform.

    Finally, the similarity comparison is performed. Each line in each candidate file is hashed and collected in a sorted list. Long lines are split at 60 characters. Whitespace only lines may be stripped on the assumption that they don't contribute greatly to the similarity matching. The line hashes from each candidate source are compared to the line hashes from each candidate target. If two lists are 60% similar, they are deemed a rename.

提交回复
热议问题