问题
I was doing some performance evaluation using timeit and discovered a performance degredation between python 2.7.10 and python 3.4.3. I narrowed it down to the hash()
function:
python 2.7.10:
>>> import timeit
>>> timeit.timeit('for x in xrange(100): hash(x)', number=100000)
0.4529099464416504
>>> timeit.timeit('hash(1000)')
0.044638872146606445
python 3.4.3:
>>> import timeit
>>> timeit.timeit('for x in range(100): hash(x)', number=100000)
0.6459149940637872
>>> timeit.timeit('hash(1000)')
0.07708719989750534
That's an approx. 40% degradation! It doesn't seem to matter if integers, floats, strings(unicodes or bytearrays), etc, are being hashed; the degradation is about the same. In both cases the hash is returning a 64-bit integer. The above was run on my Mac, and got a smaller degradation (20%) on an Ubuntu box.
I've also used PYTHONHASHSEED=random for the python2.7 tests and in some cases, restarting python for each "case", I saw the hash()
performance get a bit worse, but never as slow as python3.4
Anyone know what's going on here? Was a more-secure, but slower, hash function chosen for python3 ?
回答1:
There are two changes in hash()
function between Python 2.7 and Python 3.4
- Adoptions of SipHash
- Default enabling of Hash randomization
References:
- Since from Python 3.4, it uses SipHash for it's hashing function. Read: Python adopts SipHash
- Since Python 3.3 Hash randomization is enabled by default. Reference: object.__hash__ (last line of this section). Specifying PYTHONHASHSEED the value 0 will disable hash randomization.
来源:https://stackoverflow.com/questions/40137072/why-is-hash-slower-under-python3-4-vs-python2-7