Why are hash table expansions usually done by doubling the size?

后端未结

关注

 6  1008

小蘑菇 2020-12-02 06:21

I\'ve done a little research on hash tables, and I keep running across the rule of thumb that when there are a certain number of entries (either max or via a load factor lik

6条回答

旧巷少年郎 (楼主)

2020-12-02 07:16
I had read a very interesting discussion on growth strategy on this very site... just cannot find it again.

While 2 is commonly used, it's been demonstrated that it was not the best value. One often cited problem is that it does not cope well with allocators schemes (which often allocate power of twos blocks) since it would always require a reallocation while a smaller number might in fact be reallocated in the same block (simulating in-place growth) and thus being faster.

Thus, for example, the VC++ Standard Library uses a growth factor of 1.5 (ideally should be the golden number if a first-fit memory allocation strategy is being used) after an extensive discussion on the mailing list. The reasoning is explained here:
I'd be interested if any other vector implementations uses a growth factor other than 2, and I'd also like to know whether VC7 uses 1.5 or 2 (since I don't have that compiler here).

There is a technical reason to prefer 1.5 to 2 -- more specifically, to prefer values less than 1+sqrt(5)/2.

Suppose you are using a first-fit memory allocator, and you're progressively appending to a vector. Then each time you reallocate, you allocate new memory, copy the elements, then free the old memory. That leaves a gap, and it would be nice to be able to use that memory eventually. If the vector grows too rapidly, it will always be too big for the available memory.

It turns out that if the growth factor is >= 1+sqrt(5)/2, the new memory will always be too big for the hole that has been left sofar; if it is < 1+sqrt(5)/2, the new memory will eventually fit. So 1.5 is small enough to allow the memory to be recycled.

Surely, if the growth factor is >= 2 the new memory will always be too big for the hole that has been left so far; if it is < 2, the new memory will eventually fit. Presumably the reason for (1+sqrt(5))/2 is...
- Initial allocation is s.
- The first resize is k*s.
- The second resize is k*k*s, which will fit the hole iff k*k*s <= k*s+s, i.e. iff k <= (1+sqrt(5))/2
...the hole can be recycled asap.

It could, by storing its previous size, grow fibonaccily.
Of course, it should be tailored to the memory allocation strategy.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...