Proof: why does java.lang.String.hashCode()'s implementation match its documentation?

◇◆丶佛笑我妖孽 提交于 2019-12-04 19:00:13

问题


The JDK documentation for java.lang.String.hashCode() famously says:

The hash code for a String object is computed as

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

using int arithmetic, where s[i] is the *i*th character of the string, n is the length of the string, and ^ indicates exponentiation.

The standard implementation of this expression is:

int hash = 0;
for (int i = 0; i < length; i++)
{
    hash = 31*hash + value[i];
}
return hash;

Looking at this makes me feel like I was sleeping through my algorithms course. How does that mathematical expression translate into the code above?


回答1:


I'm not sure if you missed where it says "^ indicates exponentiation" (not xor) in that documentation.

Each time through the loop, the previous value of hash is multipled by 31 again before being added to the next element of value.

One could prove these things are equal by induction, but I think an example might be more clear:

Say we're dealing with a 4-char string. Let's unroll the loop:

hash = 0;
hash = 31 * hash + value[0];
hash = 31 * hash + value[1];
hash = 31 * hash + value[2];
hash = 31 * hash + value[3];

Now combine these into one statement by substituting each value of hash into the following statement:

hash = 31 * (31 * (31 * (31 * 0 + value[0]) + value[1]) + value[2])
     + value[3];

31 * 0 is 0, so simplify:

hash = 31 * (31 * (31 * value[0] + value[1]) + value[2])
     + value[3];

Now multiply the two inner terms by that second 31:

hash = 31 * (31 * 31 * value[0] + 31 * value[1] + value[2])
     + value[3];

Now multiply the three inner terms by that first 31:

hash = 31 * 31 * 31 * value[0] + 31 * 31 * value[1] + 31 * value[2]
     + value[3];

and convert to exponents (not really Java anymore):

hash = 31^3 * value[0] + 31^2 * value[1] + 31^1 * value[2] + value[3];



回答2:


unroll the loop. Then you get:

int hash = 0;

hash = 31*hash + value[0];
hash = 31*hash + value[1];
hash = 31*hash + value[2];
hash = 31*hash + value[3];
...
return hash;

Now you can do some mathematical manipulation, plug in 0 for the initial hash value:

hash = 31*(31*(31*(31*0 + value[0]) + value[1]) + value[2]) + value[3])...

Simplify it some more:

hash = 31^3*value[0] + 31^2*value[1] + 31^1*value[2] + 31^0*value[3]...

And that is essentially the original algorithm given.




回答3:


Proof by induction:

T1(s) = 0 if |s| == 0, else s[|s|-1] + 31*T(s[0..|s|-1])
T2(s) = s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
P(n) = for all strings s s.t. |s| = n, T1(s) = T2(s)

Let s be an arbitrary string, and n=|s|
Base case: n = 0
    0 (additive identity, T2(s)) = 0 (T1(s))
    P(0)
Suppose n > 0
    T1(s) = s[n-1] + 31*T1(s[0:n-1])
    T2(s) = s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] = s[n-1] + 31*(s[0]*31^(n-2) + s[1]*31^(n-3) + ... + s[n-2]) = s[n-1] + 31*T2(s[0:n-1])
    By the induction hypothesis, (P(n-1)), T1(s[0:n-1]) = T2(s[0:n-1]) so
        s[n-1] + 31*T1(s[0..n-1]) = s[n-1] + T2(s[0:n-1])
    P(n)

I think I have it, and a proof was requested.




回答4:


Take a look at the first few iterations and you'll see the pattern start to emerge:

hash0 = 0 + s0 = s0
hash1 = 31(hash0) + s1 = 31(s0) + s1
hash2 = 31(hash1) + s2 = 31(31(s0) + s1) + s2 = 312(s0) + 31(s1) + s2
...



回答5:


Isn't it useless at all to count the hashcode of the String out of all characters? Imagine filenames or classnames with their full path put into HashSet. Or someone who uses HashSets of String documents instead of Lists because "HashSet always beats Lists".

I would do something like:

int off = offset;
char val[] = value;
int len = count;

int step = len <= 10 ? 1 : len / 10;

for (int i = 0; i < len; i+=step) {
   h = 31*h + val[off+i];
}
hash = h

At the end hashcode is nothing more than a hint.



来源:https://stackoverflow.com/questions/822363/proof-why-does-java-lang-string-hashcodes-implementation-match-its-documenta

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!