Uniform distribution of integers using floating point source

后端 未结 5 588
渐次进展
渐次进展 2021-01-06 00:23

The standard way to get a random integer in the range [0, n) in JavaScript - or any other language that only offers a random() function that returns a float in the range [0,

5条回答
  •  日久生厌
    2021-01-06 01:06

    No, the resulting distribution is not going to be perfectly uniform, for most values of n. For small values, it'll be so close to uniform that you'd have a hard time detecting any difference from a uniform distribution, but as n gets larger the bias can become noticeable.

    To illustrate, here's some Python code (not JavaScript, sorry, but the principle is the same):

    from collections import Counter
    from random import random
    
    def badrand(n):
        return int(random() * n)
    
    print(Counter(badrand(6755399441055744) % 3 for _ in range(10000000)))
    

    This is producing 10 million random integers in the range [0, 6755399441055744), reducing each of those integers modulo 3, and counting the number of times the remainder is 0, 1, or 2. If we're generating those integers uniformly, we'd expect the remainders modulo 3 to be roughly evenly distributed, so we'd expect the counts to be similar.

    Here's an example result from running this on my machine:

    Counter({1: 3751915, 0: 3334643, 2: 2913442})
    

    That is, a remainder of 1 is significantly more likely to occur than 0, which in turn is significantly more likely to occur than a remainder of 2. The differences here are way too big to be explained by random variation.

    So what went wrong? Python's random() function is relatively high quality, based on the Mersenne Twister, so we're unlikely to be seeing statistical problems resulting from the base random number generator. What's happening is that random() generates one of 2^53 (roughly) equally likely outcomes - each outcome is a number of the form x / 2^53 for some integer x in the range [0, 2^53). Now in the badrand call, we're effectively mapping those outcomes to 6755399441055744 possible outputs. Now that value wasn't chosen at random (ha!); it's exactly 3/4 of 2^53. That means that under the most uniform distribution possible, 2/3 of the possible badrand output values are being hit by exactly one of the 2^53 possible random() output values, while the other 1/3 are being hit by two of the 2^53 possible random() output values. That is, some of the potential outputs are twice as likely to occur as others. So we're a long way from uniform.

    You're going to see the same effect in JavaScript. In the case of Chrome, it appears that there are only 2^32 distinct results from Math.random(), so you should be able to find effects like the above with n smaller than (but close to) 2^32.

    Of course, the same effect holds for small n, too: if n = 5, then because 5 is not a divisor of 2^32 there's no way we can perfectly evenly distribute all 2^32 possible Math.random() results between the 5 desired outcomes: the best we can hope for is that 4 of the 5 outcomes appear for 858993459 of the possible random() results each, while the fifth occurs for 858993460 of the random() results. But that distribution is going to be so close to uniform that it would be well-nigh impossible to find any statistical test to tell you differently. So for practical purposes, you should be safe with small n.

    There's a related Python bug that might be interesting at http://bugs.python.org/issue9025. That bug was solved for Python 3 by moving away from the int(random() * n) method of computing these numbers. The bug still remains in Python 2, though.

提交回复
热议问题