We are developing a highly performance critical software in C++. There we need a concurrent hash map and implemented one. So we wrote a benchmark to figure out, how much slo
I am guessing that you have not properly sized your unordered_map, as Ylisar suggested. When chains grow too long in unordered_map, the g++ implementation will automatically rehash to a larger hash table, and this would be a big drag on performance. If I remember correctly, unordered_map defaults to (smallest prime larger than) 100.
I didn't have chrono on my system, so I timed with times().
template
void time_test (TEST t, const char *m) {
struct tms start;
struct tms finish;
long ticks_per_second;
times(&start);
t();
times(&finish);
ticks_per_second = sysconf(_SC_CLK_TCK);
std::cout << "elapsed: "
<< ((finish.tms_utime - start.tms_utime
+ finish.tms_stime - start.tms_stime)
/ (1.0 * ticks_per_second))
<< " " << m << std::endl;
}
I used a SIZE of 10000000, and had to change things a bit for my version of boost. Also note, I pre-sized the hash table to match SIZE/DEPTH, where DEPTH is an estimate of the length of the bucket chain due to hash collisions.
Edit: Howard points out to me in comments that the max load factor for unordered_map is 1. So, the DEPTH controls how many times the code will rehash.
#define SIZE 10000000
#define DEPTH 3
std::vector vec(SIZE);
boost::mt19937 rng;
boost::uniform_int dist(std::numeric_limits::min(),
std::numeric_limits::max());
std::unordered_map map(SIZE/DEPTH);
void
test_insert () {
for (int i = 0; i < SIZE; ++i) {
map[vec[i]] = 0.0;
}
}
void
test_get () {
long double val;
for (int i = 0; i < SIZE; ++i) {
val = map[vec[i]];
}
}
int main () {
for (int i = 0; i < SIZE; ++i) {
uint64_t val = 0;
while (val == 0) {
val = dist(rng);
}
vec[i] = val;
}
time_test(test_insert, "inserts");
std::random_shuffle(vec.begin(), vec.end());
time_test(test_insert, "get");
}
Edit:
I modified the code so that I could change out DEPTH more easily.
#ifndef DEPTH
#define DEPTH 10000000
#endif
So, by default, the worst size for the hash table is chosen.
elapsed: 7.12 inserts, elapsed: 2.32 get, -DDEPTH=10000000
elapsed: 6.99 inserts, elapsed: 2.58 get, -DDEPTH=1000000
elapsed: 8.94 inserts, elapsed: 2.18 get, -DDEPTH=100000
elapsed: 5.23 inserts, elapsed: 2.41 get, -DDEPTH=10000
elapsed: 5.35 inserts, elapsed: 2.55 get, -DDEPTH=1000
elapsed: 6.29 inserts, elapsed: 2.05 get, -DDEPTH=100
elapsed: 6.76 inserts, elapsed: 2.03 get, -DDEPTH=10
elapsed: 2.86 inserts, elapsed: 2.29 get, -DDEPTH=1
My conclusion is that there is not much significant performance difference for any initial hash table size other than making it equal to the entire expected number of unique insertions. Also, I don't see the order of magnitude performance difference that you are observing.