libc random number generator flawed?

前端未结

关注

 3  2153

清歌不尽 2020-12-15 19:09

Consider an algorithm to test the probability that a certain number is picked from a set of N unique numbers after a specific number of tries (for example, with N=2, what\'s

3条回答

慢半拍i (楼主)

2020-12-15 19:29

What's being measured in this experiment is the interval between successful trials of a Bernoulli experiment, where success is defined as random() mod k == 0 for some k (36 in the OP). Unfortunately, it is marred by the fact that the implementation of random() means that the Bernoulli trials are not statistically independent.

We'll write rnd_i for the i^th output of `random()' and we note that:

rnd_i = rnd_i-31 + rnd_i-3 with probability 0.75

rnd_i = rnd_i-31 + rnd_i-3 + 1 with probability 0.25

(See below for a proof outline.)

Let's suppose rnd_i-31 mod k == 0 and we're currently looking at rnd_i. Then it must be the case that rnd_i-3 mod k ≠ 0, because otherwise we would have counted the cycle as being length k-3.

But (most of the time) (mod k): rnd_i = rnd_i-31 + rnd_i-3 = rnd_i-3 ≠ 0.

So the current trial is not statistically independent of the previous trials, and the 31^st trial after a success is much less likely to succeed than it would in an unbiased series of Bernoulli trials.

The usual advice in using linear-congruential generators, which doesn't actually apply to the random() algorithm, is to use the high-order bits instead of the low-order bits, because high-order bits are "more random" (that is, less correlated with successive values). But that won't work in this case either, because the above identities hold equally well for the function high log k bits as for the function mod k == low log k bits.

In fact, we might expect a linear-congruential generator to work better, particularly if we use the high-order bits of the output, because although the LCG is not particularly good at Monte Carlo simulations, it does not suffer from the linear feedback of random().

random algorithm, for the default case:

Let state be a vector of unsigned longs. Initialize state₀...state₃₀ using a seed, some fixed values, and a mixing algorithm. For simplicity, we can consider the state vector to be infinite, although only the last 31 values are used so it's actually implemented as a ring buffer.

To generate rnd_i: (Note: ⊕ is addition mod 2³².)
state_i = state_i-31 ⊕ state_i-3 rnd_i = (state_i - (state_i mod 2)) / 2 Now, note that: (i + j) mod 2 = i mod 2 + j mod 2 if i mod 2 == 0 or j mod 2 == 0 (i + j) mod 2 = i mod 2 + j mod 2 - 2 if i mod 2 == 1 and j mod 2 == 1 If i and j are uniformly distributed, the first case will occur 75% of the time, and the second case 25%. So, by substitution in the generation formula: rnd_i = (state_i-31 ⊕ state_i-3 - ((state_i-31 + state_i-3) mod 2)) / 2 = ((state_i-31 - (state_i-31 mod 2)) ⊕ (state_i-3 - (state_i-3 mod 2))) / 2 or = ((state_i-31 - (state_i-31 mod 2)) ⊕ (state_i-3 - (state_i-3 mod 2)) + 2) / 2 The two cases can be further reduced to: rnd_i = rnd_i-31 ⊕ rnd_i-3
rnd_i = rnd_i-31 ⊕ rnd_i-3 + 1

As above, the first case occurs 75% of the time, assuming that rnd_i-31 and rnd_i-3 are independently drawn from a uniform distribution (which they're not, but it's a reasonable first approximation).

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...