I have a java applciation in which I want to generate long
ids for strings (in order to store those strings in neo4j). In order to avoid data duplication, I wou
This code will calculate pretty good hash:
String s = "some string";
long hash = UUID.nameUUIDFromBytes(s.getBytes()).getMostSignificantBits();
long
has 64 bits. A String
of length 9 has 72 bits. from pigeon hole principle - you cannot get a unique hashing for 9 chars long strings to a long
.
If you still want a long
hash: You can just take two standard [different!] hash functions for String->int
, hash1()
and hash2()
and calculate: hash(s) = 2^32* hash1(s) + hash2(s)
There are many answers, try the following:
long
requirement. Mea culpa.Or, as suggested before, check out the sources.
PS. One more technique is to maintain a dictionary of strings: since you're unlikely to get 264 strings any time soon, you can have perfect mapping. Note though that that mapping may as well become a major bottleneck.
Why don't you have a look a the hashcode()
function of String, and just adopt it to using long values instead?
Btw. if there was a way to create a unique ID for each String, then you would have found a compression algorithm that would be able to pack every String into 8 bytes (not possible by definition).