Why is std::unordered_set rehashed even if the load factor limit is not broken?

天涯浪子 提交于 2019-12-12 14:27:25

问题


According to cppreference,

Rehashing occurs only if the new number of elements is greater than max_load_factor()*bucket_count().

In addition, [unord.req]/15 has similar rules:

The insert and emplace members shall not affect the validity of iterators if (N+n) <= z * B, where N is the number of elements in the container prior to the insert operation, n is the number of elements inserted, B is the container's bucket count, and z is the container's maximum load factor.

However, consider the following example:

#include <unordered_set>
#include <iostream>

int main()
{
    std::unordered_set<int> s;
    s.emplace(1);
    s.emplace(42);
    std::cout << s.bucket_count() << ' ';
    std::cout << (3 > s.max_load_factor() * s.bucket_count()) << ' ';
    s.emplace(2);
    std::cout << s.bucket_count() << ' ';
}

With GCC 8.0.1, it outputs

3 0 7

This means after emplacing 2, a rehashing occurs though the new number of elements (3) is not greater than max_load_factor()*bucket_count() (note the second output is 0). Why does this happen?


回答1:


You're confusing the fact that the bucket_count() has changed with the invalidation of iterators. Iterators are only invalidated in case of rehash, which will not be one if new number of elements is less than or equal to max_load_factor()*bucket_count() (btw if size()>max_load_factor()*bucket_count() rehashing can occur, but doesn't have to).

As this was not the case in your example, no rehashing occurred and iterators remain valid. However, the bucket count had to be increased to accommodate the new element.

I experimented a bit (expanding on your code) with Mac OSX's clang, which kept the iterators valid even after rehash(size()) (which did change the element's bucket association, tested directly by iterating over the buckets and printing their contents).




回答2:


From 26.2.7 Unordered associative containers

The number of buckets is automatically increased as elements are added to an unordered associative container, so that the average number of elements per bucket is kept below a bound.

b.load_factor()           Returns the average number of elements per bucket.

b.max_load_factor()       Returns a positive number that the container attempts 
                          to keep the load factor less than or equal to. The container
                          automatically increases the number of buckets as necessary
                          to keep the load factor below this number.

I agree, the first part of description of max_load_factor suggests that the load factor could reach that value, but in the second part and in the foregoing quote it's clearly stated that the load factor will be kept below this number. So, you have found a mistake in cppreference.

In your code, without rehashing, after the third insertion your would have s.load_factor equal to s.max_load_factor().

EDIT: To answer changes in the question i checked available to me VS implementation of unordered_set, it is implemented as

// hash table -- list with vector of iterators for quick access

and then you ask for an iterator, using e.g. lower_bound, you get iterator to the list element, which doesn't get invalidated by rehashing. So, it agrees with [unord.req]/15.




回答3:


The rehash condition is changed since Issue 2156. Before the change, a rehash occurs when the new number of elements is no less than max_load_factor()*bucket_count(), and it becomes "greater than" after the change.

GCC 8.0.1 does not implement this change. There is already a bug report, and it has been fixed in GCC 9.



来源:https://stackoverflow.com/questions/49333414/why-is-stdunordered-set-rehashed-even-if-the-load-factor-limit-is-not-broken

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!