Murmur3 hash different result between Python and Java implementation

◇◆丶佛笑我妖孽 提交于 2019-12-01 09:41:54

问题


I have two different program that wish to hash same string using Murmur3 in Python and Java respectively.

Python version 2.7.9:

mmh3.hash128('abc')

Gives 79267961763742113019008347020647561319L.

Java is Guava 18.0:

HashCode hashCode = Hashing.murmur3_128().newHasher().putString("abc", StandardCharsets.UTF_8).hash();

Gives string "6778ad3f3f3f96b4522dca264174a23b", converting to BigInterger gives 137537073056680613988840834069010096699.

How to get same result from both?

Thanks


回答1:


Here's how to get the same result from both:

byte[] mm3_le = Hashing.murmur3_128().hashString("abc", UTF_8).asBytes();
byte[] mm3_be = Bytes.toArray(Lists.reverse(Bytes.asList(mm3_le)));
assertEquals("79267961763742113019008347020647561319",
    new BigInteger(mm3_be).toString());

The hash code's bytes need to be treated as little endian but BigInteger interprets bytes as big endian. You were presumably using new BigInteger(hex, 16) to create the BigInteger, but the output of HashCode.toString() is actually a series of pairs of hexadecimal digits representing the hash bytes in the same order they're returned by asBytes() (little endian). (You can also reverse those pairs of hexadecimal to get a hex number that does produce the same result when passed to new BigInteger(reversedHex, 16)).

I think the documentation of toString() is somewhat confusing because of the way it refers to "big endian"; it doesn't actually mean that the output of the method is the hexadecimal number representing the bytes interpreted as big endian.

We have an open issue for adding asBigInteger() to HashCode.




回答2:


If anyone is interested in the reverse answer, converting the python output to the Java output:

import mmh3
import string

char_array = '0123456789abcdef'
mumrmur = mmh3.hash_bytes('abc')

result = [f'{string.hexdigits[(char >> 4) & 0xf]}{string.hexdigits[char & 0xf]}' for char in mumrmur]
print(''.join(result))


来源:https://stackoverflow.com/questions/29932956/murmur3-hash-different-result-between-python-and-java-implementation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!