Hash function that produces short hashes?

后端 未结 10 1696
走了就别回头了
走了就别回头了 2020-12-07 23:58

Is there a way of encryption that can take a string of any length and produce a sub-10-character hash? I want to produce reasonably unique ID\'s but based on message content

10条回答
  •  死守一世寂寞
    2020-12-08 00:39

    If you need "sub-10-character hash" you could use Fletcher-32 algorithm which produces 8 character hash (32 bits), CRC-32 or Adler-32.

    CRC-32 is slower than Adler32 by a factor of 20% - 100%.

    Fletcher-32 is slightly more reliable than Adler-32. It has a lower computational cost than the Adler checksum: Fletcher vs Adler comparison.

    A sample program with a few Fletcher implementations is given below:

        #include 
        #include 
        #include  // for uint32_t
    
        uint32_t fletcher32_1(const uint16_t *data, size_t len)
        {
                uint32_t c0, c1;
                unsigned int i;
    
                for (c0 = c1 = 0; len >= 360; len -= 360) {
                        for (i = 0; i < 360; ++i) {
                                c0 = c0 + *data++;
                                c1 = c1 + c0;
                        }
                        c0 = c0 % 65535;
                        c1 = c1 % 65535;
                }
                for (i = 0; i < len; ++i) {
                        c0 = c0 + *data++;
                        c1 = c1 + c0;
                }
                c0 = c0 % 65535;
                c1 = c1 % 65535;
                return (c1 << 16 | c0);
        }
    
        uint32_t fletcher32_2(const uint16_t *data, size_t l)
        {
            uint32_t sum1 = 0xffff, sum2 = 0xffff;
    
            while (l) {
                unsigned tlen = l > 359 ? 359 : l;
                l -= tlen;
                do {
                    sum2 += sum1 += *data++;
                } while (--tlen);
                sum1 = (sum1 & 0xffff) + (sum1 >> 16);
                sum2 = (sum2 & 0xffff) + (sum2 >> 16);
            }
            /* Second reduction step to reduce sums to 16 bits */
            sum1 = (sum1 & 0xffff) + (sum1 >> 16);
            sum2 = (sum2 & 0xffff) + (sum2 >> 16);
            return (sum2 << 16) | sum1;
        }
    
        int main()
        {
            char *str1 = "abcde";  
            char *str2 = "abcdef";
    
            size_t len1 = (strlen(str1)+1) / 2; //  '\0' will be used for padding 
            size_t len2 = (strlen(str2)+1) / 2; // 
    
            uint32_t f1 = fletcher32_1(str1,  len1);
            uint32_t f2 = fletcher32_2(str1,  len1);
    
            printf("%u %X \n",    f1,f1);
            printf("%u %X \n\n",  f2,f2);
    
            f1 = fletcher32_1(str2,  len2);
            f2 = fletcher32_2(str2,  len2);
    
            printf("%u %X \n",f1,f1);
            printf("%u %X \n",f2,f2);
    
            return 0;
        }
    

    Output:

    4031760169 F04FC729                                                                                                                                                                                                                              
    4031760169 F04FC729                                                                                                                                                                                                                              
    
    1448095018 56502D2A                                                                                                                                                                                                                              
    1448095018 56502D2A                                                                                                                                                                                                                              
    

    Agrees with Test vectors:

    "abcde"  -> 4031760169 (0xF04FC729)
    "abcdef" -> 1448095018 (0x56502D2A)
    

    Adler-32 has a weakness for short messages with few hundred bytes, because the checksums for these messages have a poor coverage of the 32 available bits. Check this:

    The Adler32 algorithm is not complex enough to compete with comparable checksums.

提交回复
热议问题