I\'ve been playing with Python\'s hash function. For small integers, it appears hash(n) == n
always. However this does not extend to large numbers:
Based on python documentation in pyhash.c file:
For numeric types, the hash of a number x is based on the reduction of x modulo the prime
P = 2**_PyHASH_BITS - 1
. It's designed so thathash(x) == hash(y)
whenever x and y are numerically equal, even if x and y have different types.
So for a 64/32 bit machine, the reduction would be 2 _PyHASH_BITS - 1, but what is _PyHASH_BITS
?
You can find it in pyhash.h header file which for a 64 bit machine has been defined as 61 (you can read more explanation in pyconfig.h
file).
#if SIZEOF_VOID_P >= 8
# define _PyHASH_BITS 61
#else
# define _PyHASH_BITS 31
#endif
So first off all it's based on your platform for example in my 64bit Linux platform the reduction is 261-1, which is 2305843009213693951
:
>>> 2**61 - 1
2305843009213693951
Also You can use math.frexp
in order to get the mantissa and exponent of sys.maxint
which for a 64 bit machine shows that max int is 263:
>>> import math
>>> math.frexp(sys.maxint)
(0.5, 64)
And you can see the difference by a simple test:
>>> hash(2**62) == 2**62
True
>>> hash(2**63) == 2**63
False
Read the complete documentation about python hashing algorithm https://github.com/python/cpython/blob/master/Python/pyhash.c#L34
As mentioned in comment you can use sys.hash_info
(in python 3.X) which will give you a struct sequence of parameters used for computing
hashes.
>>> sys.hash_info
sys.hash_info(width=64, modulus=2305843009213693951, inf=314159, nan=0, imag=1000003, algorithm='siphash24', hash_bits=64, seed_bits=128, cutoff=0)
>>>
Alongside the modulus that I've described in preceding lines, you can also get the inf
value as following:
>>> hash(float('inf'))
314159
>>> sys.hash_info.inf
314159