Fast intersection of sets: C++ vs C#

后端 未结 13 1622
野性不改
野性不改 2020-12-28 10:17

On my machine (Quad core, 8gb ram), running Vista x64 Business, with Visual Studio 2008 SP1, I am trying to intersect two sets of numbers very quickly.

I\'ve impleme

相关标签:
13条回答
  • 2020-12-28 10:52

    Use this...

    vector<int> set1(10000);
    vector<int> set2(1000);
    

    ... to get vectors of non-zero initial size. Then don't use push_back, but just update the values directly.

    0 讨论(0)
  • 2020-12-28 10:52

    By the way, if you have large sorted sets std::set_intersection is not the fastest algorithm. std::set_intersection takes up to 2*(m+n)-1 comparisons but algorithms like the one from Baeza-Yates can be faster. For small m, Baeza-Yates is O(m * log(n)), while for n = alpha * m it is O(n). The basic idea is to do a kind of 2 way binary search.

    http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.7899&rep=rep1&type=pdf

    Experimental Analysis of a Fast Intersection Algorithm for Sorted Sequences Ricardo Baeza-Yates and Alejandro Salinger

    OR

    R. Baeza-Yates. A Fast Set Intersection Algorithm for Sorted Sequences. In Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM 2004), Springer LNCS 3109, pp 400-408, Istanbul, Turkey, July 2004.

    Below is an explanation and an implementation by Erik Frey where he shows significantly faster results than std::set_intersection with a binary probe. I have not tried his code yet.
    http://fawx.com/

    1. Pick the median element, A, in the smaller set.
    2. Search for its insertion-position element, B, in the larger set.
    3. If A and B are equal, append the element to the result.
    4. Repeat steps 1-4 on non-empty subsets on either side of elements A and B.

    ;

    /* * baeza_intersect */ template< template class Probe, class RandomAccessIterator, class OutputIterator> void baeza_intersect(RandomAccessIterator begin1, RandomAccessIterator end1, RandomAccessIterator begin2, RandomAccessIterator end2, OutputIterator out) { RandomAccessIterator probe1, probe2;

    if ( (end1 - begin1) < ( end2 - begin2 ) ) { if ( begin1 == end1 ) return; probe1 = begin1 + ( ( end1 - begin1 ) >> 1 ); probe2 = lower_bound< Probe >( begin2, end2, *probe1 ); baeza_intersect< Probe >(begin1, probe1, begin2, probe2, out); // intersect left if (! (probe2 == end2 || *probe1 < *probe2 )) *out++ = *probe2++; baeza_intersect< Probe >(++probe1, end1, probe2, end2, out); // intersect right } else { if ( begin2 == end2 ) return; probe2 = begin2 + ( ( end2 - begin2 ) >> 1 ); probe1 = lower_bound< Probe >( begin1, end1, *probe2 ); baeza_intersect< Probe >(begin1, probe1, begin2, probe2, out); // intersect left if (! (probe1 == end1 || *probe2 < *probe1 )) *out++ = *probe1++; baeza_intersect< Probe >(probe1, end1, ++probe2, end2, out); // intersect right } }

    /* * with a comparator */ template< template class Probe, class RandomAccessIterator, class OutputIterator, class Comparator > void baeza_intersect(RandomAccessIterator begin1, RandomAccessIterator end1, RandomAccessIterator begin2, RandomAccessIterator end2, OutputIterator out, Comparator cmp) { RandomAccessIterator probe1, probe2;

      if ( (end1 - begin1) < ( end2 - begin2 ) )
      {
        if ( begin1 == end1 )
          return;
        probe1 = begin1 + ( ( end1 - begin1 ) >> 1 );
        probe2 = lower_bound< Probe >( begin2, end2, *probe1, cmp );
        baeza_intersect< Probe >(begin1, probe1, begin2, probe2, out, cmp); // intersect left
        if (! (probe2 == end2 || cmp( *probe1, *probe2 ) ))
          *out++ = *probe2++;
        baeza_intersect< Probe >(++probe1, end1, probe2, end2, out, cmp); // intersect right
      }
      else
      {
        if ( begin2 == end2 )
          return;
        probe2 = begin2 + ( ( end2 - begin2 ) >> 1 );
        probe1 = lower_bound< Probe >( begin1, end1, *probe2, cmp );
        baeza_intersect< Probe >(begin1, probe1, begin2, probe2, out, cmp); // intersect left
        if (! (probe1 == end1 || cmp( *probe2, *probe1 ) ))
          *out++ = *probe1++;
        baeza_intersect< Probe >(probe1, end1, ++probe2, end2, out, cmp); // intersect right
      }
    }
    

    // probe.hpp

    /** * binary probe: pick the next element by choosing the halfway point between low and high */ template< class RandomAccessIterator, class T > struct binary_probe { RandomAccessIterator operator()(RandomAccessIterator begin, RandomAccessIterator end, const T & value) { return begin + ( (end - begin) >> 1); } };

    /** * lower_bound: like stl's lower_bound but with different kinds of probing * note the appearance of the rare template parameter template! */ template< template class Probe, class RandomAccessIterator, class T > RandomAccessIterator lower_bound(RandomAccessIterator begin, RandomAccessIterator end, const T & value) { RandomAccessIterator pit; Probe< RandomAccessIterator, T > pfunc; // probe-functor (wants to get func'd up)

    while ( begin < end ) { pit = pfunc(begin, end, value); if ( *pit < value ) begin = pit + 1; else end = pit; } return begin; }

    /* * this time with a comparator! */ template< template class Probe, class RandomAccessIterator, class T, class Comparator > RandomAccessIterator lower_bound(RandomAccessIterator begin, RandomAccessIterator end, const T & value, Comparator cmp) { RandomAccessIterator pit; Probe< RandomAccessIterator, T > pfunc;

    while ( begin < end ) { pit = pfunc(begin, end, value); if ( cmp(*pit, value) ) begin = pit + 1; else end = pit; } return begin; }

    0 讨论(0)
  • 2020-12-28 10:53

    Ok, after much feedback I've updated the original question a number of times:

    • The tests are now each run 1,000 times
    • The C# code now uses a higher resolution timer
    • The data structures are now populated BEFORE the tests

    The result of this so far is that C# is still ~5x faster than C++.

    Thanks everyone for your ideas/suggestions.

    0 讨论(0)
  • 2020-12-28 10:55

    Latest benchmark:

    Found the intersection of 504 values (using unordered_map) 1000 times, in 28827.6ms
    Found the intersection of 495 values (using set_intersection) 1000 times, in 9817.69ms
    Found the intersection of 504 values (using unordered_set) 1000 times, in 24769.1ms
    

    I think the 504 - 495 difference happens because there are a couple dupe values.

    Code:
    
    // MapPerformance.cpp : Defines the entry point for the console application.
    //
    
    #include "stdafx.h"
    #include <hash_map>
    #include <vector>
    #include <iostream>
    #include <time.h>
    #include <algorithm>
    #include <set>
    #include <unordered_set>
    
    #include <boost\unordered\unordered_map.hpp>
    
    #include "timer.h"
    
    using namespace std;
    using namespace stdext;
    using namespace boost;
    using namespace tr1;
    
    
    int runIntersectionTest2(const vector<int>& set1, const vector<int>& set2)
    {
        // hash_map<int,int> theMap;
        // map<int,int> theMap;
        unordered_set<int> theSet;      
    
         theSet.insert( set1.begin(), set1.end() );
    
        int intersectionSize = 0;
    
        vector<int>::const_iterator set2_end = set2.end();
    
        for ( vector<int>::const_iterator iterator = set2.begin(); iterator != set2_end; ++iterator )
        {
            if ( theSet.find(*iterator) != theSet.end() )
            {
                    intersectionSize++;
            }
        }
    
        return intersectionSize;
    }
    
    int runIntersectionTest(const vector<int>& set1, const vector<int>& set2)
    {
        // hash_map<int,int> theMap;
        // map<int,int> theMap;
        unordered_map<int,int> theMap;  
    
        vector<int>::const_iterator set1_end = set1.end();
    
        // Now intersect the two sets by populating the map
        for ( vector<int>::const_iterator iterator = set1.begin(); iterator != set1_end; ++iterator )
        {
            int value = *iterator;
    
            theMap[value] = 1;
        }
    
        int intersectionSize = 0;
    
        vector<int>::const_iterator set2_end = set2.end();
    
        for ( vector<int>::const_iterator iterator = set2.begin(); iterator != set2_end; ++iterator )
        {
            int value = *iterator;
    
            unordered_map<int,int>::iterator foundValue = theMap.find(value);
    
            if ( foundValue != theMap.end() )
            {
                theMap[value] = 2;
    
                intersectionSize++;
            }
        }
    
        return intersectionSize;
    
    }
    
    int runSetIntersection(const vector<int>& set1_unsorted, const vector<int>& set2_unsorted)
    {   
        // Create two vectors
        std::vector<int> set1(set1_unsorted.size());
        std::vector<int> set2(set2_unsorted.size());
    
        // Copy the unsorted data into them
        std::copy(set1_unsorted.begin(), set1_unsorted.end(), set1.begin());
        std::copy(set2_unsorted.begin(), set2_unsorted.end(), set2.begin());
    
        // Sort the data
        sort(set1.begin(),set1.end());
        sort(set2.begin(),set2.end());
    
        vector<int> intersection;
        intersection.reserve(1000);
    
        set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), back_inserter(intersection));
    
        return intersection.size(); 
    }
    
    void createSets( vector<int>& set1, vector<int>& set2 )
    {
        srand ( time(NULL) );
    
        set1.reserve(100000);
        set2.reserve(1000);
    
        // Create 100,000 values for set1
        for ( int i = 0; i < 100000; i++ )
        {
            int value = 1000000000 + i;
            set1.push_back(value);
        }
    
        // Try to get half of our values intersecting
        float ratio = 200000.0f / RAND_MAX;
    
    
        // Create 1,000 values for set2
        for ( int i = 0; i < 1000; i++ )
        {
            int random = rand() * ratio + 1;
    
            int value = 1000000000 + random;
            set2.push_back(value);
        }
    
        // Make sure set1 is in random order (not sorted)
        random_shuffle(set1.begin(),set1.end());
    }
    
    int _tmain(int argc, _TCHAR* argv[])
    {
        int intersectionSize = 0;
    
        vector<int> set1, set2; 
        createSets( set1, set2 );
    
        Timer timer;
        for ( int i = 0; i < 1000; i++ )
        {
            intersectionSize = runIntersectionTest(set1, set2);
        }
        timer.Stop();
    
        cout << "Found the intersection of " << intersectionSize << " values (using unordered_map) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;
    
        timer.Reset();
        for ( int i = 0; i < 1000; i++ )
        {
            intersectionSize = runSetIntersection(set1,set2);
        }
        timer.Stop();
    
        cout << "Found the intersection of " << intersectionSize << " values (using set_intersection) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;
    
        timer.Reset();
        for ( int i = 0; i < 1000; i++ )
        {
            intersectionSize = runIntersectionTest2(set1,set2);
        }
        timer.Stop();
    
        cout << "Found the intersection of " << intersectionSize << " values (using unordered_set) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;
    
        getchar();
    
        return 0;
    }
    
    0 讨论(0)
  • 2020-12-28 10:57

    There are several problems with your test.

    First, you are not testing set intersection, but "create a couple of arrays, fill them with random numbers, and then perform set intersection". You should only time the portion of the code you're actually interested in. Even if you're going to want to do those things, they should not be benchmarked here. Measure one thing at a time, to reduce uncertainty. If you want your C++ implementation to perform better, you first need to know which part of it is slower than expected. Which means you have to separate setup code from intersection test.

    Second, you should run the test a large number of times to take possible caching effects and other uncertainties into account. (And probably output one total time for, say, 1000 runs, rather than an individual time for each. That way you reduce the uncertainty from the timer which might have limited resolution and report inaccurate results when used in the 0-20ms range.

    Further, as far as I can read from the docs, the input to set_intersection should be sorted, which set2 won't be. An there seems to be no reason to use unordered_map, when unordered_set would be a far better match for what you're doing.

    About the setup code being needed, note that you probably don't need to populate vectors in order to run the intersection. Both your own implementation and set_intersection work on iterators already, so you can simply pass them a pair of iterators to the data structures your inputs are in already.

    A few more specific comments on your code:

    • Use ++iterator instead of iterator++
    • rather than calling vector.end() at each loop iteration, call it once and cache the result
    • experiment with using sorted vectors vs std::set vs unordered_set (not unordered_map)

    Edit:

    I haven't tried your C# version, so I can't compare the numbers properly, but here's my modified test. Each is run 1000 times, on a Core 2 Quad 2.5GHz with 4GB RAM:

    std::set_intersection on std::set: 2606ms
    std::set_intersection on tr1::unordered_set: 1014ms
    std::set_intersection on sorted vectors: 171ms
    std::set_intersection on unsorted vectors: 10140ms
    

    The last one is a bit unfair, because it has to both copy and sort the vectors. Ideally, only the sort should be part of the benchmark. I tried creating a version that used an array of 1000 unsorted vectors (so I woudln't have to copy the unsorted data in each iteration), but the performance was about the same, or a bit worse, because this would cause constant cache misses, so I reverted back to this version

    And my code:

    #define _SECURE_SCL 0
    
    #include <ctime>
    #include <vector>
    #include <set>
    #include <iostream>
    #include <algorithm>
    #include <unordered_set>
    #include <windows.h>
    
    template <typename T, typename OutIter>
    void stl_intersect(const T& set1, const T& set2, OutIter out){
        std::set_intersection(set1.begin(), set1.end(), set2.begin(), set2.end(), out);
    }
    
    template <typename T, typename OutIter>
    void sort_stl_intersect(T& set1, T& set2, OutIter out){
        std::sort(set1.begin(), set1.end());
        std::sort(set2.begin(), set2.end());
        std::set_intersection(set1.begin(), set1.end(), set2.begin(), set2.end(), out);
    }
    
    
    template <typename T>
    void init_sorted_vec(T first, T last){
        for ( T cur = first; cur != last; ++cur)
        {
            int i = cur - first;
            int value = 1000000000 + i;
            *cur = value;
        }
    }
    
    template <typename T>
    void init_unsorted_vec(T first, T last){
        for ( T cur = first; cur != last; ++cur)
        {
            int i = rand() % 200000 + 1;
            i *= 10;
    
            int value = 1000000000 + i;
            *cur = value;
        }
    }
    
    struct resize_and_shuffle {
        resize_and_shuffle(int size) : size(size) {}
    
        void operator()(std::vector<int>& vec){
            vec.resize(size);
    
        }
        int size;
    };
    
    int main()
    {
        srand ( time(NULL) );
        std::vector<int> out(100000);
    
        std::vector<int> sortedvec1(100000);
        std::vector<int> sortedvec2(1000);
    
        init_sorted_vec(sortedvec1.begin(), sortedvec1.end());
        init_unsorted_vec(sortedvec2.begin(), sortedvec2.end());
        std::sort(sortedvec2.begin(), sortedvec2.end());
    
        std::vector<int> unsortedvec1(sortedvec1.begin(), sortedvec1.end());
        std::vector<int> unsortedvec2(sortedvec2.begin(), sortedvec2.end());
    
        std::random_shuffle(unsortedvec1.begin(), unsortedvec1.end());
        std::random_shuffle(unsortedvec2.begin(), unsortedvec2.end());
    
        std::vector<int> vecs1[1000];
        std::vector<int> vecs2[1000];
    
        std::fill(vecs1, vecs1 + 1000, unsortedvec1);
        std::fill(vecs2, vecs2 + 1000, unsortedvec2);
    
        std::set<int> set1(sortedvec1.begin(), sortedvec1.end());
        std::set<int> set2(sortedvec2.begin(), sortedvec2.end());
    
        std::tr1::unordered_set<int> uset1(sortedvec1.begin(), sortedvec1.end());
        std::tr1::unordered_set<int> uset2(sortedvec2.begin(), sortedvec2.end());
    
        DWORD start, stop;
        DWORD delta[4];
    
        start = GetTickCount();
        for (int i = 0; i < 1000; ++i){
            stl_intersect(set1, set2, out.begin());
        }
        stop = GetTickCount();
        delta[0] = stop - start;
    
        start = GetTickCount();
        for (int i = 0; i < 1000; ++i){
            stl_intersect(uset1, uset2, out.begin());
        }
        stop = GetTickCount();
        delta[1] = stop - start;
    
        start = GetTickCount();
        for (int i = 0; i < 1000; ++i){
            stl_intersect(sortedvec1, sortedvec2, out.begin());
        }
        stop = GetTickCount();
        delta[2] = stop - start;
    
        start = GetTickCount();
        for (int i = 0; i < 1000; ++i){
            sort_stl_intersect(vecs1[i], vecs1[i], out.begin());
        }
        stop = GetTickCount();
        delta[3] = stop - start;
    
        std::cout << "std::set_intersection on std::set: " << delta[0] << "ms\n";
        std::cout << "std::set_intersection on tr1::unordered_set: " << delta[1] << "ms\n";
        std::cout << "std::set_intersection on sorted vectors: " << delta[2] << "ms\n";
        std::cout << "std::set_intersection on unsorted vectors: " << delta[3] << "ms\n";
    
    
        return 0;
    }
    

    There's no reason why C++ should always be faster than C#. C# has a few key advantages that require a lot of care to compete with in C++. The primary one I can think of is that dynamic allocations are ridiculously cheap in .NET-land. Every time a C++ vector, set or unordered_set (or any other container) has to resize or expand, it is a very costly malloc operation. In .NET, a heap allocation is little more than adding an offset to a pointer.

    So if you want the C++ version to compete, you'll probably have to solve that, allowing your containers to resize without having to perform actual heap allocations, probably by using custom allocators for the containers (perhaps boost::pool might be a good bet, or you can try rolling your own)

    Another issue is that set_difference only works on sorted input, and in order to reproduce tests results that involve a sort, we have to make a fresh copy of the unsorted data in each iteration, which is costly (although again, using custom allocators will help a lot). I don't know what form your input takes, but it is possible that you can sort your input directly, without copying it, and then run set_difference directly on that. (That would be easy to do if your input is an array or a STL container at least.)

    One of the key advantages of the STL is that it is so flexible, it can work on pretty much any input sequence. In C#, you pretty much have to copy the input to a List or Dictionary or something, but in C++, you might be able to get away with running std::sort and set_intersection on the raw input.

    Finally, of course, try running the code through a profiler and see exactly where the time is being spent. You might also want to try running the code through GCC instead. It's my impression that STL performance in MSVC is sometimes a bit quirky. It might be worth testing under another compiler just to see if you get similar timings there.

    Finally, you might find these blog posts relevant for performance of C++ vs C#: http://blogs.msdn.com/ricom/archive/2005/05/10/416151.aspx

    The morale of those is essentially that yes, you can get better performance in C++, but it is a surprising amount of work.

    0 讨论(0)
  • 2020-12-28 10:57

    It may also be worthwhile looking at the boost Disjoint Set container, which is specially optimized for certain kinds of large set operations.

    It works by treating a group of sets as the unions of several disjoint sets, making it possible to build other sets, such as intersections or unions very cheaply, once the initial set of disjoint sets is constructed. If you expect to be doing a lot of set operations on sets that don't change much, you can probably expect this to be very fast. If, on the other hand, you will use each set once and throw it away, it's probably not going to do too much.

    Anyway, you'd be doing yourself a favor to at least experiment with this to see if it gives you any bump in your specific case.

    0 讨论(0)
提交回复
热议问题