问题
I review the source code of Arrays.hashCode(char[] c)
I am not very confirm that the algorithm it applies well work well in all cases.
public static int hashCode(int a[]) {
if (a == null)
return 0;
int result = 1;
for (int element : a)
result = 31 * result + element;
return result;
}
Does the hash function implement here really uniformly distributes the all the input arrays.And Why we use prime 31 here .
回答1:
Why use the prime number 31?
This can be split in two parts?
- Why a prime number?
Here we need to understand that our goal is to get a unique HashCode for an object which will help us to find that object in O(1) time.
The key word here, is unique.
Primes
Primes are unique numbers. They are unique in that, the product of a prime with any other number has the best chance of being unique (not as unique as the prime itself of-course) due to the fact that a prime is used to compose it. This property is used in hashing functions.
.
Why number 31?
From Effective Java
- Because it's an odd prime, and it's "traditional" to use primes.
It's also one less than a power of two, which permits for bitwise optimization
Here's the full quote,
from Item 9: Always override hashCode when you override equals:
The value 31 was chosen because it's an odd prime. If it were even and multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional.
A nice property of 31 is that the multiplication can be replaced by a shift (§15.19) and subtraction for better performance:
31 * i == (i << 5) - i Modern VMs do this sort of optimization automatically.
While the recipe in this item yields reasonably good hash functions, it does not yield state-of-the-art hash functions, nor do Java platform libraries provide such hash functions as of release 1.6. Writing such hash functions is a research topic, best left to mathematicians and theoretical computer scientists.
Perhaps a later release of the platform will provide state-of-the-art hash functions for its classes and utility methods to allow average programmers to construct such hash functions. In the meantime, the techniques described in this item should be adequate for most applications.
This is a very Good source.
回答2:
The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional. A nice property of 31 is that the multiplication can be replaced by a shift and a subtraction for better performance: 31 * i == (i << 5) - i. Modern VMs do this sort of optimization automatically.
回答3:
See this post: Why does Java's hashCode() in String use 31 as a multiplier?
That's where TheEwook's answer is from.
Generally, you use primes because they don't have any factors and will distribute better modulo N where N is the size of the range you are binning into. 31 is a small, odd prime so it works well. However, as the various sources you will find on the Internets will indicate, a small prime like 31 may lead to more collisions than a larger prime (especially if the values being hashed are not well distributed to begin with), so you could pick a larger prime if you found the performance to not be as good as you'd like.
来源:https://stackoverflow.com/questions/18788292/does-hashcode-implementation-of-java-arrays-hashcode-uniformly-distribute