问题
My goal is to use the result of an MD5 result to index a hash table. I want to perform a Modulo operation on it to find the appropriate slot in the table. I have tried casting it as an unsigned long long type. When I printed the result, I got a different number every time for the same MD5 hash. The MD5 hash is initially an unsigned char *. Can someone tell me what I am doing wrong?
Here is my function:
int get_fp_slot(unsigned char * fingerprint, int size)
{
return (unsigned long long)fingerprint % size;
}
回答1:
An MD5 hash is a 128 bit number. So for best performance you should probably keep all 128 bits.
Given that your function takes the 128 bit hash as a character string, you need to parse that string into a series of 4 integers. Your string probably looks something like this:
79054025255fb1a26e4bc422aef54eb4
That is a 32 byte hexadecimal string. If so, you extract the binary version like this:
int v1, v2, v3, v4;
sscanf( &fingerprint[0], "%x", &v1 );
sscanf( &fingerprint[8], "%x", &v2 );
sscanf( &fingerprint[16], "%x", &v3 );
sscanf( &fingerprint[24], "%x", &v4 );
What you do now really depends on how good you want your hash to be. If you really need to use a 32 bit number then just XOR all those numbers together:
int hash = v1 ^ v2 ^ v3 ^v4;
回答2:
You are casting the pointer, i.e. the address of the hash. Of course that address is unrelated to the value of the hash.
How to fix it depends on what you want. You can for example use the last 16 bytes of the hash and parse that to an unsigned long long
,
// sanity and error checking omitted for brevity
int get_fp_slot(unsigned char *fingerprint, int size)
{
size_t len = strlen(fingerprint);
size_t offset = len < 16 ? 0 : len-16;
unsigned long long hash_tail = strtoull(fingerprint + offset,NULL,16);
return hash_tail % size;
}
or do the modulo incrementally
// uses a helper hex_val that converts a hexadecimal digit to the integer it signifies
int get_fp_slot(unsigned char *fingerprint, int size)
{
unsigned long long hash_mod = 0;
while(*fingerprint) {
hash_mod = (16*hash_mod + hex_val(*fingerprint)) % size;
++fingerprint;
}
return hash_mod;
}
回答3:
In your code you are converting the pointer itself, not the bytes that form the MD5 value!
A MD5 is 128 bits, that is 16 bytes. Assuming that your long long
type is 64 bits (8 bytes) you can represent it as two long long
values, then XOR them to get the hash. Or if you prefer, you could simply pick one of them... the hash quality is probably similar.
You don't say it, but I'm assuming that your fingerprint is a pointer to an array of 16 bytes with the MD5 value. Then:
unsigned long long a = *(unsigned long long*)fingerprint;
unsigned long long b = *(unsigned long long*)(fingerprint + 8);
return a ^ b;
Note that the values of a
and b
will depend on the endianness of your machine. It doesn't matter as long as you don't send the hashes to a different architecture.
来源:https://stackoverflow.com/questions/11180028/converting-md5-result-into-an-integer-in-c