Indeterministic sets in Python 2 and 3

蹲街弑〆低调 提交于 2019-12-31 07:24:06

问题


Python 2

Sets are collections of unordered values. If I construct a set via a set literal, e.g.

s = {'a', 'b', 'c'}

and then print it, I get the elements in some scrambled order. However, it seems that in Python 2.7, the above example always results in the same ordering:

print(s)  # set(['a', 'c', 'b']) in Python 2.7

How does Python 2.7 decide on this ordering? Even the hashes of 'a', 'b' and 'c' are not in the order produced.

Python 3

In Python 3.x (including 3.6 where dict keys are ordered) the resulting order seems to be random, though always the same within a given Python process. That is, repeatedly re-building the set literal always lead to the same ordering, as long as I do not restart the Python interpreter.

To check the ordering across multiple Python processes, consider the bash code

(for _ in {1..50}; do python3 -c "s = {'a', 'b', 'c'}; print(s)"; done) | sort -u

This will (most often) show the 6 different ways the 3 elements can be arranged. Switching out python3 with python(2), we only see the ordering ['a', 'c', 'b']. What determines the ordering in Python 3?

I see that the hash value of objects are deterministic in Python 2 while random (though constant within a Python process) in Python 3. I'm sure this is key to the full explanation.

Edit

As deceze writes in his comment, I would like to know if Python explicitly does something just to achieve this randomization, or if it happens "for free".


回答1:


The reason for the difference in Python 3 (from Python 3.3 onwards) is that hash randomization is enabled by default, you could turn this off by setting the PYTHONHASHSEED environmental variable to a fixed value:

$ export PYTHONHASHSEED=0
$ (for _ in {1..50}; do python3  -c "s = {'a', 'b', 'c'}; print(s)"; done) | sort -u
{'a', 'b', 'c'}

Equally you can turn hash randomization on in Python 2 with the -R flag:

$ (for _ in {1..50}; do python2 -R -c "s = {'a', 'b', 'c'}; print(s)"; done) | sort -u
set(['a', 'b', 'c'])
set(['a', 'c', 'b'])
set(['b', 'c', 'a'])
set(['c', 'b', 'a'])

Note, you don't generally want to turn it off since having hash randomization enabled helps protect against certain denial-of-service attacks.



来源:https://stackoverflow.com/questions/49896202/indeterministic-sets-in-python-2-and-3

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!