What is a good hash function for a collection (i.e., multi-set) of integers?

后端 未结 6 1320
不知归路
不知归路 2021-02-05 06:03

I\'m looking for a function that maps a multi-set of integers to an integer, hopefully with some kind of guarantee like pairwise independence.

Ideally, memory usage woul

6条回答
  •  感动是毒
    2021-02-05 06:46

    I agree with Dzmitry on using of arithmetic SUM of hashes, but I'd recommend using a hash function with good output distribution for input integers instead of just reversing bits in the integer. Reversing bits doesn't improve output distribution. It can even worsen output distribution, since the probability that the high order bits will be lost due sum overflow is much higher that the probability that the low order bits will be lost in this case. Here is an example of a fast hash function with good output distribution: http://burtleburtle.net/bob/c/lookup3.c . Read also the paper describing how hash functions must be constructed - http://burtleburtle.net/bob/hash/evahash.html .

    Using SUM of hash values for each element in the set satisfies requirements in the questions:

    • memory usage is constant. We need to store an ordinary integer containing hash value per each set. This integer will be used for O(1) updating of the hash when adding/removing elements from the set.
    • Addition of a new element requires only addition of the element's hash value to the existing hash value, i.e. the operation is O(1).
    • Removing of existing element requires only subtraction of the element's hash value from the existing hash value, i.e. the operation is O(1).
    • The hash will be different for sets, which differ only by pairs of identical elements.

    SUM and SUB are safe operations in the face of integer overflow, since they are reversible in a modular arithmetic, where modulus is 2^32 or 2^64 for integers in java.

提交回复
热议问题