Unordered_set questions

问题

Could anyone explain how an unordered set works? I am also not sure how a set works. My main question is what is the efficiency of its find function.

For example, what is the total big O run time of this?

    vector<int> theFirst;
    vector<int> theSecond;
    vector<int> theMatch;

    theFirst.push_back( -2147483648 );
    theFirst.push_back(2);
    theFirst.push_back(44);


    theSecond.push_back(2);
    theSecond.push_back( -2147483648 );
    theSecond.push_back( 33 );


    //1) Place the contents into a unordered set that is O(m). 
    //2) O(n) look up so thats O(m + n). 
    //3) Add them to third structure so that's O(t)
    //4) All together it becomes O(m + n + t)
    unordered_set<int> theUnorderedSet(theFirst.begin(), theFirst.end());

    for(int i = 0; i < theSecond.size(); i++) 
    {
        if(theUnorderedSet.find(theSecond[i]) != theUnorderedSet.end()) 
        {
        theMatch.push_back( theSecond[i] );
        cout << theSecond[i];
        }
   }

回答1:

unordered_set and all the other unordered_ data structures use hashing, as mentioned by @Sean. Hashing involves amortized constant time for insertion, and close to constant time for lookup. A hash function essentially takes some information and produces a number from it. It is a function in the sense that the same input has to produce the same output. However, different inputs can result in the same output, resulting in what is termed a collision. Lookup would be guaranteed to be constant time for an "perfect hash function", that is, one with no collisions. In practice, the input number comes from the element you store in the structure (say it's value, it is a primitive type) and maps it to a location in a data structure. Hence, for a given key, the function takes you to the place where the element is stored without need for any traversals or searches (ignoring collisions here for simplicity), hence constant time. There are different implementations of these structures (open addressing, chaining, etc.) See hash table, hash function. I also recommend section 3.7 of The Algorithm Design Manual by Skiena. Now, concerning big-O complexity, you are right that you have O(n) + O(n) + O(size of overlap). Since the overlap cannot be bigger than the smaller of m and n, the overall complexity can be expressed as O(kN), where N is the largest between m and n. So, O(N). Again, this is "best case", without collisions, and with perfect hashing.

set and multi_set on the other hand use binary trees, so insertions and look-ups are typically O(logN). The actual performance of a hashed structure vs. a binary tree one will depend on N, so it is best to try the two approaches and profile them in a realistic running scenario.

回答2:

All of the std::unordered_*() data types make use of a hash to perform lookups. Look at Boost's documentation on the subject and I think you'll gain an understanding very quickly.

http://www.boost.org/doc/libs/1_46_1/doc/html/unordered.html

来源：https://stackoverflow.com/questions/6204982/unordered-set-questions

标签

c++

set

unordered-set