Is gcc std::unordered_map implementation slow? If so - why?

后端 未结 3 1070
旧时难觅i
旧时难觅i 2020-12-02 08:31

We are developing a highly performance critical software in C++. There we need a concurrent hash map and implemented one. So we wrote a benchmark to figure out, how much slo

3条回答
  •  星月不相逢
    2020-12-02 09:28

    I am guessing that you have not properly sized your unordered_map, as Ylisar suggested. When chains grow too long in unordered_map, the g++ implementation will automatically rehash to a larger hash table, and this would be a big drag on performance. If I remember correctly, unordered_map defaults to (smallest prime larger than) 100.

    I didn't have chrono on my system, so I timed with times().

    template 
    void time_test (TEST t, const char *m) {
        struct tms start;
        struct tms finish;
        long ticks_per_second;
    
        times(&start);
        t();
        times(&finish);
        ticks_per_second = sysconf(_SC_CLK_TCK);
        std::cout << "elapsed: "
                  << ((finish.tms_utime - start.tms_utime
                       + finish.tms_stime - start.tms_stime)
                      / (1.0 * ticks_per_second))
                  << " " << m << std::endl;
    }
    

    I used a SIZE of 10000000, and had to change things a bit for my version of boost. Also note, I pre-sized the hash table to match SIZE/DEPTH, where DEPTH is an estimate of the length of the bucket chain due to hash collisions.

    Edit: Howard points out to me in comments that the max load factor for unordered_map is 1. So, the DEPTH controls how many times the code will rehash.

    #define SIZE 10000000
    #define DEPTH 3
    std::vector vec(SIZE);
    boost::mt19937 rng;
    boost::uniform_int dist(std::numeric_limits::min(),
                                      std::numeric_limits::max());
    std::unordered_map map(SIZE/DEPTH);
    
    void
    test_insert () {
        for (int i = 0; i < SIZE; ++i) {
            map[vec[i]] = 0.0;
        }
    }
    
    void
    test_get () {
        long double val;
        for (int i = 0; i < SIZE; ++i) {
            val = map[vec[i]];
        }
    }
    
    int main () {
        for (int i = 0; i < SIZE; ++i) {
            uint64_t val = 0;
            while (val == 0) {
                val = dist(rng);
            }
            vec[i] = val;
        }
        time_test(test_insert, "inserts");
        std::random_shuffle(vec.begin(), vec.end());
        time_test(test_insert, "get");
    }
    

    Edit:

    I modified the code so that I could change out DEPTH more easily.

    #ifndef DEPTH
    #define DEPTH 10000000
    #endif
    

    So, by default, the worst size for the hash table is chosen.

    elapsed: 7.12 inserts, elapsed: 2.32 get, -DDEPTH=10000000
    elapsed: 6.99 inserts, elapsed: 2.58 get, -DDEPTH=1000000
    elapsed: 8.94 inserts, elapsed: 2.18 get, -DDEPTH=100000
    elapsed: 5.23 inserts, elapsed: 2.41 get, -DDEPTH=10000
    elapsed: 5.35 inserts, elapsed: 2.55 get, -DDEPTH=1000
    elapsed: 6.29 inserts, elapsed: 2.05 get, -DDEPTH=100
    elapsed: 6.76 inserts, elapsed: 2.03 get, -DDEPTH=10
    elapsed: 2.86 inserts, elapsed: 2.29 get, -DDEPTH=1
    

    My conclusion is that there is not much significant performance difference for any initial hash table size other than making it equal to the entire expected number of unique insertions. Also, I don't see the order of magnitude performance difference that you are observing.

提交回复
热议问题