Why is hash() slower under python3.4 vs python2.7

问题

I was doing some performance evaluation using timeit and discovered a performance degredation between python 2.7.10 and python 3.4.3. I narrowed it down to the hash() function:

python 2.7.10:

>>> import timeit
>>> timeit.timeit('for x in xrange(100): hash(x)', number=100000)
0.4529099464416504
>>> timeit.timeit('hash(1000)')
0.044638872146606445

python 3.4.3:

>>> import timeit
>>> timeit.timeit('for x in range(100): hash(x)', number=100000)
0.6459149940637872
>>> timeit.timeit('hash(1000)')
0.07708719989750534

That's an approx. 40% degradation! It doesn't seem to matter if integers, floats, strings(unicodes or bytearrays), etc, are being hashed; the degradation is about the same. In both cases the hash is returning a 64-bit integer. The above was run on my Mac, and got a smaller degradation (20%) on an Ubuntu box.

I've also used PYTHONHASHSEED=random for the python2.7 tests and in some cases, restarting python for each "case", I saw the hash() performance get a bit worse, but never as slow as python3.4

Anyone know what's going on here? Was a more-secure, but slower, hash function chosen for python3 ?

回答1:

There are two changes in hash() function between Python 2.7 and Python 3.4

Adoptions of SipHash
Default enabling of Hash randomization

References:

Since from Python 3.4, it uses SipHash for it's hashing function. Read: Python adopts SipHash
Since Python 3.3 Hash randomization is enabled by default. Reference: object.__hash__ (last line of this section). Specifying PYTHONHASHSEED the value 0 will disable hash randomization.

来源：https://stackoverflow.com/questions/40137072/why-is-hash-slower-under-python3-4-vs-python2-7

标签

python

python-3.4