How to generate an un-biased random number within an arbitrary range using the fewest bits

后端 未结 2 1514
情深已故
情深已故 2020-12-11 08:54

Because of the pigeon hole principle, you can\'t just use

output = min + (rand() % (int)(max - min + 1))

to generate an unbiased, uniform,

相关标签:
2条回答
  • 2020-12-11 09:27

    The common approach to eliminating bias is to throw away numbers that are outside of the desired range. As noted, this is wasteful. It's possible to minimize the waste by starting with a larger number of bits and generating multiple random numbers at the same time; you can achieve a better match between the range of inputs to outputs.

    For example take a roll of a die. The output has 6 possibilities. The naive approach would take 3 random bits for each random number produced. The first example demonstrates the pigeon-hole problem.

    def pigeon_die(total_bit_count):
        for i in xrange(total_bit_count // 3):
            bits = random.getrandbits(3)
            yield 1 + bits * 6 // 8
    
    1 : 832855
    2 : 417835
    3 : 416012
    4 : 833888
    5 : 416189
    6 : 416554
    total 3333333
    max/min 2.00448063998
    

    The second example is the wasteful approach commonly used. You can see that it generates fewer random number from the same number of random bits, but the bias is eliminated.

    def wasteful_die(total_bit_count):
        for i in xrange(total_bit_count // 3):
            bits = random.getrandbits(3)
            if bits < 6:
                yield 1 + bits
    
    1 : 417043
    2 : 415812
    3 : 417835
    4 : 416012
    5 : 416645
    6 : 417243
    total 2500590
    max/min 1.00486517946
    

    The final example takes 13 bits at a time and generates 5 random numbers from it. This generates even more numbers than the naive approach!

    def optimized_die(total_bit_count):
        for i in xrange(total_bit_count // 13):
            bits = random.getrandbits(13)
            if bits < 6**5:
                for j in range(5):
                    yield 1 + bits % 6
                    bits //= 6
    
    1 : 608776
    2 : 608849
    3 : 608387
    4 : 608119
    5 : 607855
    6 : 608559
    total 3650545
    max/min 1.00163525841
    

    The choice of 13 bits was made by taking the logarithm base 6 of powers of 2 and choosing the one that was closest to an integer.

    def waste_list(n):
        for bit in range(1, 31):
            potential = math.log(2**bit, n)
            count = int(potential)
            if count > 0:
                waste = potential - count
                yield waste, bit, count
    
    for waste, bit, count in sorted(waste_list(6)):
        print bit, count, waste
        if bit == 3:
            break
    
    13 5 0.029086494049
    26 10 0.0581729880981
    8 3 0.0948224578763
    21 8 0.123908951925
    3 1 0.160558421704
    

    As you can see, there are 4 choices better than the simple 3 bits.

    0 讨论(0)
  • 2020-12-11 09:37

    I am afraid that your suggested approach is biased.

    Suppose the random number generator made numbers from 0 to 255, but you wanted a random number in the range 0 to 254.

    If the random number generator produced 255 (1111_1111 in binary), your approach would throw away a single bit and add one more until eventually you would end up with the number 254 (1111_1110 in binary). (If I have understood your approach correctly.)

    Therefore your generated numbers would have a probability of 1/256 for every number, except 254 which would have a probability of 1/128.

    0 讨论(0)
提交回复
热议问题