Why does numpy.random.Generator.choice provides different results (seeded) with given uniform distribution compared to default uniform distribution?

前提是你 提交于 2020-07-10 10:27:05

问题


Simple test code:

pop = numpy.arange(20)
rng = numpy.random.default_rng(1)
rng.choice(pop,p=numpy.repeat(1/len(pop),len(pop))) # yields 10
rng = numpy.random.default_rng(1)
rng.choice(pop) # yields 9

The numpy documentation says:

The probabilities associated with each entry in a. If not given the sample assumes a uniform distribution over all entries in a.

I don't know of any other way to create a uniform distribution, but numpy.repeat(1/len(pop),len(pop)).

Is numpy using something else? Why?

If not, how does setting the distribution affects the seed?

Shouldn't the distribution and the seed be independent?

What am I missing here?


回答1:


A more idiomatic way of creating a uniform distribution with numpy would be:

numpy.random.uniform(low=0.0, high=1.0, size=None)

or in your case numpy.random.uniform(low=0.0, high=20.0, size=1)

Alternatively, you could simply do

rng = numpy.random.default_rng(1)
rng.uniform()*20

As for your question on why the two methods of calling the rnd.choice result in different outputs, my guess would be that they are executed slightly differently by the interpreter and thus, although you start at the same random initialization, by the time the random variable call is executed, you are at a different random elements in the two calls and get different results.




回答2:


The distribution doesn't affect the seed. Details as bellow:

I checked out the source code: numpy/random/_generator.pyx#L669

If p is given, it will use rng.random to get a random value:

import numpy

pop = numpy.arange(20)
seed = 1
rng = numpy.random.default_rng(seed)

# rng.choice works like bellow
rand = rng.random()
p = numpy.repeat(1/len(pop),len(pop))
cdf = p.cumsum()
cdf /= cdf[-1]
uniform_samples = rand
idx = cdf.searchsorted(uniform_samples, side='right')
idx = numpy.array(idx, copy=False, dtype=numpy.int64) # yields 10
print(idx)

# -----------------------
rng = numpy.random.default_rng(seed)
idx = rng.choice(pop,p=numpy.repeat(1/len(pop),len(pop))) # same as above
print(idx)

If p is not given, it will use rng.integers to get a random value:

rng = numpy.random.default_rng(seed)
idx = rng.integers(0, pop.shape[0]) # yields 9
print(idx)
# -----------------------
rng = numpy.random.default_rng(seed)
idx = rng.choice(pop) # same as above
print(idx)

You can play around using different seed value. I don't know what happens in rng.random and rng.integers, but you could see that they behave differently. That's why you got different results.



来源:https://stackoverflow.com/questions/62536092/why-does-numpy-random-generator-choice-provides-different-results-seeded-with

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!