Using one probability set to generate another [duplicate]

How can I generate a bigger probability set from a smaller probability set?
This is from Algorithm Design Manual -Steven Skiena
Q:

Use a random number generator (rng04) that generates numbers from {0,1,2,3,4} with equal probability to write a random number generator that generates numbers from 0 to 7 (rng07) with equal probability?

I tried for around 3 hours now, mostly based on summing two rng04 outputs. The problem is that in that case the probability of each value is different - 4 can come with 5/24 probability while 0 happening is 1/24. I tried some ways to mask it, but cannot.

Can somebody solve this?

You have to find a way to combine the two sets of random numbers (the first and second random {0,1,2,3,4} ) and make n*n distinct possibilities. Basically the problem is that with addition you get something like this

        X
      0 1 2 3 4

  0   0 1 2 3 4
Y 1   1 2 3 4 5
  2   2 3 4 5 6
  3   3 4 5 6 7
  4   4 5 6 7 8

Which has duplicates, which is not what you want. One possible way to combine the two sets would be the Z = X + Y*5 where X and Y are the two random numbers. That would give you a set of results like this

        X
       0  1  2  3  4

  0    0  1  2  3  4
Y 1    5  6  7  8  9
  2   10 11 12 13 14
  3   15 16 17 18 19
  4   20 21 22 23 24

So now that you have a bigger set of random numbers, you need to do the reverse and make it smaller. This set has 25 distinct values (because you started with 5, and used two random numbers, so 5*5=25). The set you want has 8 distinct values. A naïve way to do this would be

x = rnd(5)  // {0,1,2,3,4}
y = rnd(5)  // {0,1,2,3,4}
z = x+y*5   // {0-24}
random07 = x mod 8

This would indeed have a range of {0,7}. But the values {1,7} would appear 3/25 times, and the value 0 would appear 4/25 times. This is because 0 mod 8 = 0, 8 mod 8 = 0, 16 mod 8 = 0 and 24 mod 8 = 0.

To fix this, you can modify the code above to this.

do {
  x = rnd(5)  // {0,1,2,3,4}
  y = rnd(5)  // {0,1,2,3,4}
  z = x+y*5   // {0-24}
while (z != 24)

random07 = z mod 8

This will take the one value (24) that is throwing off your probabilities and discard it. Generating a new random number if you get a 'bad' value like this will make your algorithm run very slightly longer (in this case 1/25 of the time it will take 2x as long to run, 1/625 it will take 3x as long, etc). But it will give you the right probabilities.

The real problem, of course, is the fact that the numbers in the middle of the sum (4 in this case) occur in many combinations (0+4, 1+3, etc.) whereas 0 and 8 have exactly one way to be produced.

I don't know how to solve this problem, but I'm going to try to reduce it a bit for you. Some points to consider:

The 0-7 range has 8 possible values, so ultimately the total number of possible situations that you should aim for has to be a multiple of 8. That way you can have an integral number of distributions per value in that codomain.
When you take the sum of two density functions, the number of possible situations (not necessarily distinct when you evaluate the sum, just in terms of different permutations of inputs) is equal to the product of the size of each of the input sets.
Thus, given two {0,1,2,3,4} sets summed together, you have 5*5=25 possibilities.
It will not be possible to get a multiple of eight (see first point) from powers of 5 (see second point, but extrapolate it to any number of sets > 1), so you will need to have a surplus of possible situations in your function and ignore some of them if they occur.
The simplest way to do that, as far as I can see at this point, is to use the sum of two {0,1,2,3,4} sets (25 possibilities) and ignore 1 (to leave 24, a multiple of 8).
Thus the challenge now has been reduced to this: Find a way to distribute the remaining 24 possibilities among the 8 output values. For this, you'll probably NOT want to use the sum, but rather just the input values.

One way to do that is, imagine a number in base 5 constructed from your input. Ignore 44 (that's your 25th, superfluous value; if you get it, synthesize a new set of inputs) and take the others, modulo 8, and you'll get your 0-7 across 24 different input combinations (3 each), which is an equal distribution.

My logic would be this:

rn07 = 0;
do {
  num = rng04;
}
while(num == 4);

rn07 = num * 2;
do {
  num = rng04;
}
while(num == 4);

rn07 += num % 2

来源：https://stackoverflow.com/questions/1268025/using-one-probability-set-to-generate-another

标签

probability