libc random number generator flawed?

前端 未结 3 2153
清歌不尽
清歌不尽 2020-12-15 19:09

Consider an algorithm to test the probability that a certain number is picked from a set of N unique numbers after a specific number of tries (for example, with N=2, what\'s

3条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-12-15 19:29

    What's being measured in this experiment is the interval between successful trials of a Bernoulli experiment, where success is defined as random() mod k == 0 for some k (36 in the OP). Unfortunately, it is marred by the fact that the implementation of random() means that the Bernoulli trials are not statistically independent.

    We'll write rndi for the ith output of `random()' and we note that:

    rndi = rndi-31 + rndi-3     with probability 0.75

    rndi = rndi-31 + rndi-3 + 1 with probability 0.25

    (See below for a proof outline.)

    Let's suppose rndi-31 mod k == 0 and we're currently looking at rndi. Then it must be the case that rndi-3 mod k ≠ 0, because otherwise we would have counted the cycle as being length k-3.

    But (most of the time) (mod k): rndi = rndi-31 + rndi-3 = rndi-3 ≠ 0.

    So the current trial is not statistically independent of the previous trials, and the 31st trial after a success is much less likely to succeed than it would in an unbiased series of Bernoulli trials.

    The usual advice in using linear-congruential generators, which doesn't actually apply to the random() algorithm, is to use the high-order bits instead of the low-order bits, because high-order bits are "more random" (that is, less correlated with successive values). But that won't work in this case either, because the above identities hold equally well for the function high log k bits as for the function mod k == low log k bits.

    In fact, we might expect a linear-congruential generator to work better, particularly if we use the high-order bits of the output, because although the LCG is not particularly good at Monte Carlo simulations, it does not suffer from the linear feedback of random().


    random algorithm, for the default case:

    Let state be a vector of unsigned longs. Initialize state0...state30 using a seed, some fixed values, and a mixing algorithm. For simplicity, we can consider the state vector to be infinite, although only the last 31 values are used so it's actually implemented as a ring buffer.

    To generate rndi: (Note: is addition mod 232.)

    statei = statei-31 ⊕ statei-3

    rndi = (statei - (statei mod 2)) / 2

    Now, note that:

    (i + j) mod 2 = i mod 2 + j mod 2    if i mod 2 == 0 or j mod 2 == 0

    (i + j) mod 2 = i mod 2 + j mod 2 - 2 if i mod 2 == 1 and j mod 2 == 1

    If i and j are uniformly distributed, the first case will occur 75% of the time, and the second case 25%.

    So, by substitution in the generation formula:

    rndi = (statei-31 ⊕ statei-3 - ((statei-31 + statei-3) mod 2)) / 2

         = ((statei-31 - (statei-31 mod 2)) ⊕ (statei-3 - (statei-3 mod 2))) / 2 or

         = ((statei-31 - (statei-31 mod 2)) ⊕ (statei-3 - (statei-3 mod 2)) + 2) / 2

    The two cases can be further reduced to:

    rndi = rndi-31 ⊕ rndi-3

    rndi = rndi-31 ⊕ rndi-3 + 1

    As above, the first case occurs 75% of the time, assuming that rndi-31 and rndi-3 are independently drawn from a uniform distribution (which they're not, but it's a reasonable first approximation).

提交回复
热议问题