问题
According to cppreference,
Rehashing occurs only if the new number of elements is greater than
max_load_factor()*bucket_count()
.
In addition, [unord.req]/15 has similar rules:
The
insert
andemplace
members shall not affect the validity of iterators if(N+n) <= z * B
, whereN
is the number of elements in the container prior to the insert operation,n
is the number of elements inserted,B
is the container's bucket count, andz
is the container's maximum load factor.
However, consider the following example:
#include <unordered_set>
#include <iostream>
int main()
{
std::unordered_set<int> s;
s.emplace(1);
s.emplace(42);
std::cout << s.bucket_count() << ' ';
std::cout << (3 > s.max_load_factor() * s.bucket_count()) << ' ';
s.emplace(2);
std::cout << s.bucket_count() << ' ';
}
With GCC 8.0.1, it outputs
3 0 7
This means after emplacing 2, a rehashing occurs though the new number of elements (3) is not greater than max_load_factor()*bucket_count()
(note the second output is 0). Why does this happen?
回答1:
You're confusing the fact that the bucket_count()
has changed with the invalidation of iterators. Iterators are only invalidated in case of rehash, which will not be one if new number of elements is less than or equal to max_load_factor()*bucket_count()
(btw if size()>max_load_factor()*bucket_count()
rehashing can occur, but doesn't have to).
As this was not the case in your example, no rehashing occurred and iterators remain valid. However, the bucket count had to be increased to accommodate the new element.
I experimented a bit (expanding on your code) with Mac OSX's clang, which kept the iterators valid even after rehash(size())
(which did change the element's bucket association, tested directly by iterating over the buckets and printing their contents).
回答2:
From 26.2.7 Unordered associative containers
The number of buckets is automatically increased as elements are added to an unordered associative container, so that the average number of elements per bucket is kept below a bound.
b.load_factor() Returns the average number of elements per bucket. b.max_load_factor() Returns a positive number that the container attempts to keep the load factor less than or equal to. The container automatically increases the number of buckets as necessary to keep the load factor below this number.
I agree, the first part of description of max_load_factor
suggests that the load factor could reach that value, but in the second part and in the foregoing quote it's clearly stated that the load factor will be kept below this number. So, you have found a mistake in cppreference.
In your code, without rehashing, after the third insertion your would have s.load_factor
equal to s.max_load_factor()
.
EDIT: To answer changes in the question i checked available to me VS implementation of unordered_set
, it is implemented as
// hash table -- list with vector of iterators for quick access
and then you ask for an iterator, using e.g. lower_bound
, you get iterator to the list element, which doesn't get invalidated by rehashing. So, it agrees with [unord.req]/15.
回答3:
The rehash condition is changed since Issue 2156. Before the change, a rehash occurs when the new number of elements is no less than max_load_factor()*bucket_count()
, and it becomes "greater than" after the change.
GCC 8.0.1 does not implement this change. There is already a bug report, and it has been fixed in GCC 9.
来源:https://stackoverflow.com/questions/49333414/why-is-stdunordered-set-rehashed-even-if-the-load-factor-limit-is-not-broken