In my repo, how long must the longest hash prefix be to prevent any overlap?

后端未结

关注

 2  554

The --abbrev-commit flag can be used in conjunction with git log and git rev-list in order to show partial prefixes instead of the ful

相关标签:

2条回答

悲&欢浪女

2020-12-16 19:17
The following shell script, when run in a local repo, prints the length of the longest prefix required to prevent any overlap among all prefix hashes of commit objects of that repository.
```
MAX_LENGTH=4;

git rev-list --abbrev=4 --abbrev-commit --all | \
  ( while read -r line; do
      if [ ${#line} -gt $MAX_LENGTH ]; then
        MAX_LENGTH=${#line};
      fi
    done && printf %s\\n "$MAX_LENGTH"
  )
```
The last time I edited this answer, the script printed
- "9" when run in a clone of the Git-project repo,
- "9" when run in a clone of the OpenStack repo,
- "11" when run in a clone of the Linux-kernel repo.
0 讨论(0)
发布评论:

提交评论
- 加载中...
说谎

2020-12-16 19:26
Jubob's script is great, upvoted.

If you want to get an idea of the distribution of minimum-commit-hash-length, you can run this one-liner:
```
git rev-list --abbrev=4 --abbrev-commit --all | ( while read -r line; do echo ${#line}; done; ) | sort -n | uniq -c
```
For the git project itself today (git-on-git), this yields something like:
```
 1788 4
35086 5
 7881 6
  533 7
   39 8
    4 9
```
... yielding 1788 commits that can be represented uniquely with a 4-char hash (or lower, this is Git's minimum abbrev), and 4 commits which require 9-of-40 characters of the hash in-order to uniquely select them.

By comparison, a much larger project such as the Linux kernel, has this distribution today:
```
6179   5
446463 6
139247 7
10018  8
655    9
41    10
3     11
```
So with a database of nearly 5 million objects and 600k commits, there's 3 commits currently requiring 11 of 40 hexadecimal digits to distinguish them from all other commits.
0 讨论(0)
发布评论:

提交评论
- 加载中...