I\'ve read in other posts that this seems to be the best way to combine hash-values. Could somebody please break this down and explain why this is the best way to do it?
It's not the best, surprisingly to me it's not even particularily good. The main problem is the bad distribution, which is not really the fault of boost::hash_combine in itself, but in conjunction with a badly distributing hash like std::hash which is most commonly implemented with the identity function.
Figure 2: The effect of a single bit change in one of two random 32 bit numbers on the result of boost::hash_combine
To demonstrate how bad things can become these are the collisions for points on a 32x32 grid when using hash_combine as intended, and with std::hash:
# hash x₀ y₀ x₁ y₁ ...
3449074105 6 30 8 15
3449074104 6 31 8 16
3449074107 6 28 8 17
3449074106 6 29 8 18
3449074109 6 26 8 19
3449074108 6 27 8 20
3449074111 6 24 8 21
3449074110 6 25 8 22
For a well distributed hash there should be none, statistically. Using bit-rotations instead of bit-shifts and xor instead of addition one could easily create a similar hash_combine that preserves entropy better. But really what you should do is use a good hash function in the first place, then after that a simple xor is sufficient to combine the seed and the hash.
#include
#include
template
T xorshift(const T& n,int i){
return n^(n>>i);
}
uint32_t distribute(const uint32_t& n){
uint32_t p = 0x55555555ul; // pattern of alternating 0 and 1
uint32_t c = 3423571495ul; // random uneven integer constant;
return c*xorshift(p*xorshift(n,16),16);
}
uint64_t hash(const uint64_t& n){
uint64_t p = 0x5555555555555555; // pattern of alternating 0 and 1
uint64_t c = 17316035218449499591ull;// random uneven integer constant;
return c*xorshift(p*xorshift(n,32),32);
}
// if c++20 rotl is not available:
template
typename std::enable_if::value,T>::type
constexpr rotl(const T n, const S i){
const T m = (std::numeric_limits::digits-1);
const T c = i&m;
return (n<>((T(0)-c)&m)); // this is usually recognized by the compiler to mean rotation, also c++20 now gives us rotl directly
}
template
inline size_t hash_combine(std::size_t& seed, const T& v)
{
return rotl(seed,std::numeric_limits::digits/3) ^ distribute(std::hash(v));
}
The seed is rotated once before combining it to make the order in which the hash was computed relevant.
The hash_combine from boost needs two operations less, and more importantly no multiplications, in fact it's about 5x faster, but at about 2 cyles per hash on my machine the proposed solution is still very fast and pays off quickly when used for a hash table. There are 118 collisions on a 1024x1024 grid (vs. 982017 for boosts hash_combine + std::hash), about as many as expected for a well distributed hash function and that is all we can ask for.
Now even when used in conjunction with a good hash function boost::hash_combine is not ideal. If all entropy is in the seed at some point some of it will get lost. There are 2948667289 distinct results of boost::hash_combine(x,0), but there should be 4294967296 .
In conclusion, they tried to create a hash function that does both, combining and cascading, and fast, but ended up with something that does both just good enough to not be recognised as bad immediately.