Seeded Python RNG showing non-deterministic behavior with sets

可紊 提交于 2021-01-27 05:43:50

问题


I'm seeing non-deterministic behavior when trying to select a pseudo-random element from sets, even though the RNG is seeded (example code shown below). Why is this happening, and should I expect other Python data types to show similar behavior?

Notes: I've only tested this on Python 2.7, but it's been reproducible on two different Windows computers.

Similar Issue: The issue at Python random seed not working with Genetic Programming example code may be similar. Based on my testing, my hypothesis is that run-to-run memory allocation differences within the sets is leading to different elements getting picked up for the same RNG state.

So far I haven't found any mention of this kind of caveat/issue in the Python docs for set or random.

Example Code (randTest produces different output run-to-run):

import random

''' Class contains a large set of pseudo-random numbers. '''
class bigSet:
    def __init__(self):
        self.a = set()
        for n in range(2000):
            self.a.add(random.random())
        return


''' Main test function. '''
def randTest():
    ''' Seed the PRNG. '''
    random.seed(0)

    ''' Create sets of bigSet elements, presumably many memory allocations. ''' 
    b = set()
    for n in range (2000):
        b.add(bigSet())

    ''' Pick a random value from a random bigSet. Would have expected this to be deterministic. '''    
    c = random.sample(b,1)[0]
    print('randVal: ' + str(random.random()))           #This value is always the same
    print('setSample: ' + str(random.sample(c.a,1)[0])) #This value can change run-to-run
    return

回答1:


It has to do with object instantiation of mutable objects. if I create a set of frozenset it does give a deterministic result;

Python 2.7.11 (default, Jan  9 2016, 15:47:04) 
[GCC 4.2.1 Compatible FreeBSD Clang 3.4.1 (tags/RELEASE_34/dot1-final 208032)] on freebsd10
Type "help", "copyright", "credits" or "license" for more information.
>>> import random
>>> random.seed(0)
>>> set(frozenset(random.random() for i in range(5)) for j in range(5))
set([frozenset([0.7298317482601286, 0.3101475693193326, 0.8988382879679935, 0.47214271545271336, 0.6839839319154413]), frozenset([0.5833820394550312, 0.4765969541523558, 0.4049341374504143, 0.30331272607892745, 0.7837985890347726]), frozenset([0.7558042041572239, 0.5046868558173903, 0.9081128851953352, 0.28183784439970383, 0.6183689966753316]), frozenset([0.420571580830845, 0.25891675029296335, 0.7579544029403025, 0.8444218515250481, 0.5112747213686085]), frozenset([0.9097462559682401, 0.8102172359965896, 0.9021659504395827, 0.9827854760376531, 0.25050634136244054])])
>>> random.seed(0)
>>> set(frozenset(random.random() for i in range(5)) for j in range(5))
set([frozenset([0.7298317482601286, 0.3101475693193326, 0.8988382879679935, 0.47214271545271336, 0.6839839319154413]), frozenset([0.5833820394550312, 0.4765969541523558, 0.4049341374504143, 0.30331272607892745, 0.7837985890347726]), frozenset([0.7558042041572239, 0.5046868558173903, 0.9081128851953352, 0.28183784439970383, 0.6183689966753316]), frozenset([0.420571580830845, 0.25891675029296335, 0.7579544029403025, 0.8444218515250481, 0.5112747213686085]), frozenset([0.9097462559682401, 0.8102172359965896, 0.9021659504395827, 0.9827854760376531, 0.25050634136244054])])
>>> 

If I'm not mistaken, CPython uses a (mutable) object's memory location as it's id and as the key for hashing.

So while the contents of the objects will always be the same, it's id will be different;

In [13]: random.seed(0)

In [14]: k = set()

In [15]: for n in range (20):
    k.add(bigSet())
   ....:     

In [16]: for x in k:
    print(id(x))
   ....:     
34856629808
34856629864
34856631936
34856630424
34856629920
34856631992
34856630480
34856629976
34856632048
34856631040
34856630536
34856632104
34856630032
34856630592
34856630088
34856632160
34856629752
34856629696
34856630760
34856630256

In [17]: random.seed(0)

In [18]: k = set()

In [19]: for n in range (20):
   ....:         k.add(bigSet())
   ....:     

In [20]: for x in k:
   ....:         print(id(x))
   ....:     
34484534800
34856629808
34484534856
34856629864
34856631936
34856630424
34856629920
34856631992
34484534968
34856629976
34856630480
34856632048
34856631040
34484535024
34484535080
34484535136
34856632216
34484534688
34484534912
34484534744

A possible solution would be to subclass a frozen set.




回答2:


OrderedSet is the ideal choice.

Neither set nor frozenset should be used here, since nowhere is it specified that any of them are ordered. The fact that another answer works is just an accident of implementation. Sets are unordered, and relying on their order results in coupling to the Python version (and possibly machine).

I get a different order from Roland's answer in Python 3.8.6 (although the order between two runs happens to be the same). This is in spite of the fact that the random numbers generated are the same.

To preserve the order, and therefore determinism based on a random seed, you must use an ordered data structure such as OrderedSet.

If you do not have OrderedSet available, or if profiling your code shows OrderedSet is slow, you can use an OrderedDict and ignore its values.

If you have Python >= 3.6, then even regular dicts are ordered thanks to performance optimizations.




回答3:


I'm fairly certain you're correct, and the issue is caused by the run-to-run memory allocation differences for set. When I changed your program to use lists instead of sets, I got deterministic behavior:

import random

''' Class contains a large list of pseudo-random numbers. '''
class bigList:
    def __init__(self):
        self.a = [random.random() for n in range(2000)]

''' Main test function. '''
def randTest():
    ''' Seed the PRNG. '''
    random.seed(0)

    ''' Create lists of bigList elements, presumably many memory allocations. '''
    b = [bigList() for n in range(2000)]

    ''' Pick a random value from a random bigSet. Would have expected this to be deterministic. '''
    c = random.sample(b, 1)[0]
    print('randVal: ' + str(random.random()))  # This value is always the same
    # and so is this now...
    print('setSample: ' + str(random.sample(c.a, 1)[0]))

randTest()


来源:https://stackoverflow.com/questions/36317520/seeded-python-rng-showing-non-deterministic-behavior-with-sets

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!