Measuring “closeness” in large source trees

♀尐吖头ヾ 提交于 2019-12-04 05:26:41

To improve on your measurement, why not try 'git diff --shortstat' ? The output looks like this:

 1 file changed, 1 insertion(+), 2 deletions(-)

You can play around with how to prioritize files changes / insertions / deletions, depending on results.

Looking at your perl, I think you're probably not going to be able to make assumptions about the ordering of "closeness" among commits -- you may need to brute force check every commit, or at least make that an option.

I'd also suggest that instead of looking for the closest, you keep a sorted list of (commit, "closeness") pairs and perhaps display the top few and review them by hand. As mentioned below, there is no silver bullet for determining whether two sets of code are close or not simply by looking at the number of changes. That said, number of changes can definitely help you narrow down the list you should review...

UPDATE: I should also mention that another advantage of using git diff is that you don't have to run a hard reset for each commit. Simply symlink the .git/ directory from your unknown tree (the one w/o a git history), and use git reset [--mixed] and it will update the current head pointer but leave your source unchanged (obviously need to backup the unknown source tree before using this method).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!