Why is hash() slower under python3.4 vs python2.7

安稳与你 提交于 2019-12-01 04:08:18

问题


I was doing some performance evaluation using timeit and discovered a performance degredation between python 2.7.10 and python 3.4.3. I narrowed it down to the hash() function:

python 2.7.10:

>>> import timeit
>>> timeit.timeit('for x in xrange(100): hash(x)', number=100000)
0.4529099464416504
>>> timeit.timeit('hash(1000)')
0.044638872146606445

python 3.4.3:

>>> import timeit
>>> timeit.timeit('for x in range(100): hash(x)', number=100000)
0.6459149940637872
>>> timeit.timeit('hash(1000)')
0.07708719989750534

That's an approx. 40% degradation! It doesn't seem to matter if integers, floats, strings(unicodes or bytearrays), etc, are being hashed; the degradation is about the same. In both cases the hash is returning a 64-bit integer. The above was run on my Mac, and got a smaller degradation (20%) on an Ubuntu box.

I've also used PYTHONHASHSEED=random for the python2.7 tests and in some cases, restarting python for each "case", I saw the hash() performance get a bit worse, but never as slow as python3.4

Anyone know what's going on here? Was a more-secure, but slower, hash function chosen for python3 ?


回答1:


There are two changes in hash() function between Python 2.7 and Python 3.4

  1. Adoptions of SipHash
  2. Default enabling of Hash randomization

References:

  • Since from Python 3.4, it uses SipHash for it's hashing function. Read: Python adopts SipHash
  • Since Python 3.3 Hash randomization is enabled by default. Reference: object.__hash__ (last line of this section). Specifying PYTHONHASHSEED the value 0 will disable hash randomization.


来源:https://stackoverflow.com/questions/40137072/why-is-hash-slower-under-python3-4-vs-python2-7

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!