问题
I have not read the C++ standard but this is how I feel that the unordered_map of c++ suppose to work.
- Allocate a memory block in the heap.
- With every put request, hash the object and map it to a space in this memory
- During this process handle collision handling via chaining or open addressing..
I am quite surprised that I could not find much about how the memory is handled by unordered_map. Is there a specific initial size of memory which unordered_map allocates. What happens if lets say we allocated 50 int memory and we ended up inserting 5000 integer?
This will be lot of collisions so I believe there should be kind of like a re-hashing and re-sizing algorithm to decrease the number of collisions after a certain level of collision threshold is reached. Since they are explicitly provided as member functions to the class, I assume they are used internally as well. Is there a such mechanism?
回答1:
With every put request, hash the object and map it to a space in this memory
Unfortunately, this isn't exactly true. You are referring to an open addressing or closed hashing data structure which is not how unordered_map
is specified.
Every unordered_map
implementation stores a linked list to external nodes in the array of buckets. Meaning that inserting an item will always allocate at least once (the new node) if not twice (resizing the array of buckets, then the new node).
No, that is not at all the most efficient way to implement a hash map for most common uses. Unfortunately, a small "oversight" in the specification of unordered_map
all but requires this behavior. The required behavior is that iterators to elements must stay valid when inserting or deleting other elements. Because inserting might cause the bucket array to grow (reallocate), it is not generally possible to have an iterator pointing directly into the bucket array and meet the stability guarantees.
unordered_map
is a better data structure if you are storing expensive-to-copy items as your key or value. Which makes sense, given that its general design was lifted from Boost's pre-move-semantics design.
Chandler Carruth (Google) mentions this problem in his CppCon '14 talk "Efficiency with Algorithms, Performance with Data Structures".
回答2:
std::unordered_map contains a load factor that it uses to manage the size of it's internal buckets. std::unordered_map uses this odd factor to keep the size of the container somewhere in between a 0.0 and 1.0 factor. This decreases the likelihood of a collision in a bucket. After that, I'm not sure if they fallback to linear probing within a bucket that a collision was found in, but I would assume so.
来源:https://stackoverflow.com/questions/31098123/c-unordered-map-collision-handling-resize-and-rehash