fast, large-width, non-cryptographic string hashing in python

后端 未结 5 1031
伪装坚强ぢ
伪装坚强ぢ 2020-12-08 02:40

I have a need for a high-performance string hashing function in python that produces integers with at least 34 bits of output (64 bits would make sense, but

5条回答
  •  臣服心动
    2020-12-08 03:15

    Use the built-in hash() function. This function, at least on the machine I'm developing for (with python 2.7, and a 64-bit cpu) produces an integer that fits within 32 bits - not large enough for my purposes.

    That's not true. The built-in hash function will generate a 64-bit hash on a 64-bit system.

    This is the python str hashing function from Objects/stringobject.c (Python version 2.7):

    static long
    string_hash(PyStringObject *a)
    {
        register Py_ssize_t len;
        register unsigned char *p;
        register long x;      /* Notice the 64-bit hash, at least on a 64-bit system */
    
        if (a->ob_shash != -1)
        return a->ob_shash;
        len = Py_SIZE(a);
        p = (unsigned char *) a->ob_sval;
        x = *p << 7;
        while (--len >= 0)
            x = (1000003*x) ^ *p++;
        x ^= Py_SIZE(a);
        if (x == -1)
            x = -2;
        a->ob_shash = x;
        return x;
    }
    

提交回复
热议问题