Why is dictionary ordering non-deterministic?

别来无恙 提交于 2020-01-14 14:33:09

问题


I recently switched from Python 2.7 to Python 3.3, and it seems that while in Python 2 the ordering of dictionary keys was arbitrary but consistent, in Python 3 the ordering of the keys of a dictionary obtained with e.g. vars() appears non-deterministic.

If I run:

class Test(object): pass
parameters = vars(Test)
print(list(parameters.keys()))

in both Python 2.7 and Python 3.3, then:

  • Python 2.7 consistently gives me

    ['__dict__', '__module__', '__weakref__', '__doc__']
    
  • With Python 3.3, I can get any random order – for example:

    ['__weakref__', '__module__', '__qualname__', '__doc__', '__dict__']
    ['__doc__', '__dict__', '__qualname__', '__module__', '__weakref__']
    ['__dict__', '__module__', '__qualname__', '__weakref__', '__doc__']
    ['__weakref__', '__doc__', '__qualname__', '__dict__', '__module__']
    

Where does this non-determinism come from? And why is something like

list({str(i): i for i in range(10)}.keys())

… consistent between runs, always giving

['3', '2', '1', '0', '7', '6', '5', '4', '9', '8']

… ?


回答1:



Update: In Python 3.6, dict has a new implementation which preserves insertion order. From Python 3.7, this order-preserving behaviour is guaranteed:

the insertion-order preservation nature of dict objects has been declared to be an official part of the Python language spec.


This is the result of a security fix from 2012, which was enabled by default in Python 3.3 (scroll down to "Security improvements").

From the announcement:

Hash randomization causes the iteration order of dicts and sets to be unpredictable and differ across Python runs. Python has never guaranteed iteration order of keys in a dict or set, and applications are advised to never rely on it. Historically, dict iteration order has not changed very often across releases and has always remained consistent between successive executions of Python. Thus, some existing applications may be relying on dict or set ordering. Because of this and the fact that many Python applications which don't accept untrusted input are not vulnerable to this attack, in all stable Python releases mentioned here, HASH RANDOMIZATION IS DISABLED BY DEFAULT.

As noted above, the last, capitalized bit is no longer true in Python 3.3.

See also: object.__hash__() documentation ("Note" sidebar).

If absolutely necessary, you can disable hash randomization in versions of Python affected by this behaviour by setting the PYTHONHASHSEED environment variable to 0.


Your counterexample:

list({str(i): i for i in range(10)}.keys())

… does not in fact always give the same result in Python 3.3, although the number of different orderings is limited due to the way hash collisions are handled:

$ for x in {0..999}
> do
>   python3.3 -c "print(list({str(i): i for i in range(10)}.keys()))"
> done | sort | uniq -c
     61 ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
     73 ['1', '0', '3', '2', '5', '4', '7', '6', '9', '8']
     62 ['2', '3', '0', '1', '6', '7', '4', '5', '8', '9']
     59 ['3', '2', '1', '0', '7', '6', '5', '4', '9', '8']
     58 ['4', '5', '6', '7', '0', '1', '2', '3', '8', '9']
     55 ['5', '4', '7', '6', '1', '0', '3', '2', '9', '8']
     62 ['6', '7', '4', '5', '2', '3', '0', '1', '8', '9']
     63 ['7', '6', '5', '4', '3', '2', '1', '0', '9', '8']
     60 ['8', '9', '0', '1', '2', '3', '4', '5', '6', '7']
     66 ['8', '9', '2', '3', '0', '1', '6', '7', '4', '5']
     65 ['8', '9', '4', '5', '6', '7', '0', '1', '2', '3']
     53 ['8', '9', '6', '7', '4', '5', '2', '3', '0', '1']
     62 ['9', '8', '1', '0', '3', '2', '5', '4', '7', '6']
     52 ['9', '8', '3', '2', '1', '0', '7', '6', '5', '4']
     73 ['9', '8', '5', '4', '7', '6', '1', '0', '3', '2']
     76 ['9', '8', '7', '6', '5', '4', '3', '2', '1', '0']

As noted at the beginning of this answer, that's no longer the case in Python 3.6:

$ for x in {0..999}
> do
>   python3.6 -c "print(list({str(i): i for i in range(10)}.keys()))"
> done | sort | uniq -c
   1000 ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']



回答2:


Note that Python 3.7 still has non-deterministic sets however. dicts preserve insertion order but sets do not. Sets can exhibit the same random behavior.

python3 -c "print({str(i) for i in range(9)})"

still gives different results from one run to the next.



来源:https://stackoverflow.com/questions/33810024/how-and-why-is-the-dictionary-hashes-different-in-python2-and-python3

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!