unordered set intersection in C++

Here is my code, wondering any ideas to make it faster? My implementation is brute force, which is for any elements in a, try to find if it also in b, if so, put in result set c. Any smarter ideas is appreciated.

#include <iostream>
#include <unordered_set>

int main() {
    std::unordered_set<int> a = {1,2,3,4,5};
    std::unordered_set<int> b = {3,4,5,6,7};
    std::unordered_set<int> c;
    for (auto i = a.begin(); i != a.end(); i++) {
        if (b.find(*i) != b.end()) c.insert(*i);
    }
    for (int v : c) {
        std::printf("%d \n", v);
    }
}

Asymptotically, your algorithm is as good as it can get.

In practice, I'd add a check to loop over the smaller of the two sets and do lookups in the larger one. Assuming reasonably evenly distributed hashes, a lookup in a std::unoredered_set takes constant time. So this way, you'll be performing fewer such lookups.

Your algorithm is as good as it gets for a unordered set. however if you use a std::set (which uses a binary tree as storage) or even better a sorted std::vector, you can do better. The algorithm should be something like:

get iterators to a.begin() and b.begin()
if the iterators point to equal element add to intersection and increment both iterators.
Otherwise increment the iterator pointing to the smallest value
Go to 2.

Both should be O(n) time but using a normal set should save you from calculating hashes or any performance degradation that arises from hash collisions.

You can do it with std::copy_if()

std::copy_if(a.begin(), a.end(), std::inserter(c, c.begin()), [b](const int element){return b.count(element) > 0;} );

Thanks Angew, why your method is faster? Could you elaborate a bit more?

Well, let me provide you some additional info...

It should be pretty clear that, whichever data structures you use, you will have to iterate over all elements in at least one of those, so you cannot get better than O(n), n being the number of elements in the data structure selected to iterate over. Elementary now is, how fast you can look up the elements in the other structure - with a hash set, which std::unordered_set actually is, this is O(1) – at least if the number of collisions is small enough ("reasonably evenly distributed hashes"); the degenerate case would be all values having the same key...

So far, you get O(n) * O(1) = O(n). But you still the choice: O(n) or O(m), if m is the number of elements in the other set. OK, in complexity calculations, this is the same, we have a linear algorithm anyway, in practice, though, you can spare some hash calculations and look-ups if you choose the set with the lower number of elements...

来源：https://stackoverflow.com/questions/47901339/unordered-set-intersection-in-c

标签

c++

stl

unordered-set