Hash Codes for Floats in Java

可紊 提交于 2019-12-06 01:43:41
Anony-Mousse

These auto-generated hashcode functions are not very good.

The problem is that small integers cause very "sparse" and similar bitcodes.

To understand the problem, look at the actual computation.

System.out.format("%x\n", Float.floatToIntBits(1));
System.out.format("%x\n", Float.floatToIntBits(-1));
System.out.format("%x\n", Float.floatToIntBits(3));
System.out.format("%x\n", Float.floatToIntBits(-3));

gives:

3f800000
bf800000
40400000
c0400000

As you can see, the - is the most significant bit in IEEE floats. Multiplication with 31 changes them not substantially:

b0800000
30800000
c7c00000
47c00000

The problem are all the 0s at the end. They get preserved by integer multiplication with any prime (because they are base-2 0s, not base-10!).

So IMHO, the best strategy is to employ bit shifts, e.g.:

final int h1 = Float.floatToIntBits(x);
final int h2 = Float.floatToIntBits(z);
return h1 ^ ((h2 >>> 16) | (h2 << 16));

But you may want to look at Which hashing algorithm is best for uniqueness and speed? and test for your particular case of integers-as-float.

dustinroepsch

I would use Objects.hash():

public int hashCode() {
   return Objects.hash(x, z);
}

From the Javadoc:

public static int hash(Object... values)

Generates a hash code for a sequence of input values. The hash code is generated as if all the input values were placed into an array, and that array were hashed by calling Arrays.hashCode(Object[]). This method is useful for implementing Object.hashCode() on objects containing multiple fields. For example, if an object that has three fields, x, y, and z, one could write:

according to the java specification, 2 objects can have the same hashCode and this doesnt mean they are equal...

the probability is small but exist...

on the other hand is always a good practice to override both equals and hashcode...

As I understand the problem, you expect a lot of symmetrical pairs of points among your keys, so you need a hashCode method that does not tend to give them the same code.

I did some tests, and deliberately giving extra significance to the sign of x tends to map symmetrical points away from each other. See this test program:

public class Test {
  private float x;
  private float y;

  public static void main(String[] args) {
    int collisions = 0;
    for (int ix = 0; ix < 100; ix++) {
      for (int iz = 0; iz < 100; iz++) {
        Test t1 = new Test(ix, -iz);
        Test t2 = new Test(-ix, iz);
        if (t1.hashCode() == t2.hashCode()) {
          collisions++;
        }
      }
    }
    System.out.println(collisions);

  }

  public Test(float x, float y) {
    super();
    this.x = x;
    this.y = y;
  }

  @Override
  public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = (x >= 0) ? 1 : -1;
    result = prime * result + Float.floatToIntBits(x);
    result = prime * result + Float.floatToIntBits(y);
    return result;
  }
  // Equals omitted for compactness
}

Without the result = (x >= 0) ? 1 : -1; line it is the hashCode() generated by Eclipse, and counts 9802 symmetrical point collisions. With that line, it counts one symmetrical point collision.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!