问题
What would be the fastest and more robust (in terms of uniqueness) way for implementing a method like
public abstract String hash(String[] values);
The values[]
array has 100 to 1,000 members, each of a which with few dozen characters, and the method needs to be run about 10,000 times/sec on a different values[]
array each time.
Should a long string be build using a StringBuilder
buffer and then a hash method invoked on the buffer contents, or is it better to keep invoking the hash method for each string from values[]
?
Obviously a hash of at least 64 bits is needed (e.g., MD5) to avoid collisions, but is there anything simpler and faster that could be done, at the same quality?
For example, what about
public String hash(String[] values)
{
long result = 0;
for (String v:values)
{
result += v.hashCode();
}
return String.valueOf(result);
}
回答1:
Definitely don't use plain addition due to its linearity properties, but you can modify your code just slightly to achieve very good dispersion.
public String hash(String[] values) {
long result = 17;
for (String v:values) result = 37*result + v.hashCode();
return String.valueOf(result);
}
回答2:
It doesn't provide a 64 bit hash, but given the title of the question it's probably worth mentioning that since Java 1.7 there is java.util.Objects#hash(Object...).
回答3:
You should watch out for creating weaknesses when combining methods. (The java hash function and your own). I did a little research on cascaded ciphers, and this is an example of it. (the addition might interfere with the internals of hashCode().
The internals of hashCode() look like this:
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
so adding numbers together will cause the last characters of all strings in the array to just be added, which doesn't lower the randomness (this is already bad enough for a hash function).
If you want real pseudorandomness, take a look at the FNV hash algorithm. It is the fastest hash algorithm out there that is especially designed for use in HashMaps.
It goes like this:
long hash = 0xCBF29CE484222325L;
for(String s : strings)
{
hash ^= s.hashCode();
hash *= 0x100000001B3L;
}
^ This is not the actual implementation of FNV as it takes ints as input instead of bytes, but I think it works just as well.
回答4:
Here is the simple implementation using Objects class available from Java 7.
@Override
public int hashCode()
{
return Objects.hash(this.variable1, this.variable2);
}
回答5:
First, hash code is typically numeric, e.g. int
. Moreover your version of hash function create int and then makes its string representation that IMHO does not have any sense.
I'd improve your hash method as following:
public int hash(String[] values) {
long result = 0;
for (String v:values) {
result = result * 31 + v.hashCode();
}
return result;
}
Take a look on hashCode()
implemented in class java.lang.String
来源:https://stackoverflow.com/questions/10587506/creating-a-hash-from-several-java-string-objects