Why does git store objects in directories with the first two characters of the hash?

混江龙づ霸主 提交于 2019-12-03 01:23:49

Git switches from "loose objects" (in files named like 01/23456789abcdef0123456789abcdef01234567) to "packs" when the number of loose objects exceeds a magic constant (6700 by default but configurable, gc.auto). Since SHA-1 values tend to be well-distributed it can approximate total loose objects by looking in a single directory. If there are more than (6700 + 255) / 256 = 27 files in one of the object directories, it's time for a pack-file.

Thus, there's no need for additional fan-out (01/23/4567...): it's unlikely that you will get that many objects in one directory. And in fact, greater fan-out would tend to make it harder to detect that it is time for an automatic packing, unless you set the threshold value higher (than 6700), because (27 + 255) / 256 is 1—so you'd want to count everything in 01/*/ instead of just 01/.

One could use 0/1234567... and allow up to ~419 objects per directory to get the same behavior, but linear directory scans (on any system that still uses those) are O(n2), and 272 is a mere 729, while 4192 is 175561. [Edit: that only applies to file creation, where you have a two stage search, once to find that it's OK to create and a second to find a slot or append. Lookups are still O(n).]

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!