Is python's “set” stable?

后端 未结 7 1115
渐次进展
渐次进展 2020-11-29 07:21

The question arose when answering to another SO question (there).

When I iterate several times over a python set (without changing it between calls), can I assume it

7条回答
  •  不知归路
    2020-11-29 08:12

    A set or frozenset is inherently an unordered collection. Internally, sets are based on a hash table, and the order of keys depends both on the insertion order and on the hash algorithm. In CPython (aka standard Python) integers less than the machine word size (32 bit or 64 bit) hash to themself, but text strings, bytes strings, and datetime objects hash to integers that vary randomly; you can control that by setting the PYTHONHASHSEED environment variable.

    From the __hash__ docs:

    Note

    By default, the __hash__() values of str, bytes and datetime objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.

    This is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict insertion, O(n^2) complexity. See http://www.ocert.org/advisories/ocert-2011-003.html for details.

    Changing hash values affects the iteration order of dicts, sets and other mappings. Python has never made guarantees about this ordering (and it typically varies between 32-bit and 64-bit builds).

    See also PYTHONHASHSEED.

    The results of hashing objects of other classes depend on the details of the class's __hash__ method.

    The upshot of all this is that you can have two sets containing identical strings but when you convert them to lists they can compare unequal. Or they may not. ;) Here's some code that demonstrates this. On some runs, it will just loop, not printing anything, but on other runs it will quickly find a set that uses a different order to the original.

    from random import seed, shuffle
    
    seed(42)
    
    data = list('abcdefgh')
    a = frozenset(data)
    la = list(a)
    print(''.join(la), a)
    
    while True:
        shuffle(data)
        lb = list(frozenset(data))
        if lb != la:
            print(''.join(data), ''.join(lb))
            break    
    

    typical output

    dachbgef frozenset({'d', 'a', 'c', 'h', 'b', 'g', 'e', 'f'})
    deghcfab dahcbgef
    

提交回复
热议问题