Is the complexity of unordered_set::find predictable?

问题

While looking for a container suitable for an application I'm building, I ran across documentation for unordered_set. Given that my application typically requires only insert and find functions, this class seems rather attractive. I'm slightly put off, however, by the fact that find is O(1) amortized, but O(n) worst case - I would be using the function frequently, and it could make or break my application. What causes the spike in complexity? Is the likelihood of running into an O(n) search predictable?

回答1:

_unordered_set_ are implemented as hash tables, that said, one of the common implementations of hash table is using a container (ex: like vector) of hash bucket (that are a container (ex: like list) of elements of the unordered_set in the same bucket).

When inserting elements in the unordered_set, a hash function is apply to then which give you the bucket where to placed.

There could be various elements inserted that end in the same bucket, when you are finding an element, the hash functions is apply, giving you the bucket and you need to go for their elements searching the one you are looking for.

The worst case scenario is that all elements end in the same bucket (depending the containers used to store the elements in the same bucket O(n) is the worst running time of search when all the elements are in the same bucket).

The key points for elements ending in the same bucket are the hash function (how good it's) and the elements (could expose specific weakness of the hash function).

The elements normally one can no predict, if there are predictable enough in your case (you could select a hash function that spread evenly this kind of elements).

To speed up search, the key point is using good hash function (that distribute evenly the elements in the buckets and using if needed rehash increasing the bucket size (take care with this option, the hash function will be apply to all elements)).

I suggest that if it's that important for your application the storage of that elements, you do performance test with as close as possible to production data (and take decision from there), that said the containers in STL and more the containers of the same group (ex: associative, etc...) share almost the same interface, being easy to change one for another, with little or no change in the code that used.

来源：https://stackoverflow.com/questions/24846798/is-the-complexity-of-unordered-setfind-predictable

标签

c++

c++11

data-structures

complexity-theory