Why is rand()kiased?

后端 未结 5 572
小鲜肉
小鲜肉 2020-11-30 18:47

When reading how to use std::rand, I found this code on cppreference.com

int x = 7;
while(x > 6) 
    x = 1 + std::rand()/((RAND_MAX + 1u)/6);  // Note: 1         


        
5条回答
  •  无人及你
    2020-11-30 19:38

    I'm not an experienced C++ user by any means, but was interested to see if the other answers regarding std::rand()/((RAND_MAX + 1u)/6) being less biased than 1+std::rand()%6 actually holds true. So I wrote a test program to tabulate the results for both methods (I haven't written C++ in ages, please check it). A link for running the code is found here. It's also reproduced as follows:

    // Example program
    #include 
    #include 
    #include 
    #include 
    
    int main()
    {
        std::srand(std::time(nullptr)); // use current time as seed for random generator
    
        // Roll the die 6000000 times using the supposedly unbiased method and keep track of the results
    
        int results[6] = {0,0,0,0,0,0};
    
        // roll a 6-sided die 20 times
        for (int n=0; n != 6000000; ++n) {
            int x = 7;
            while(x > 6) 
                x = 1 + std::rand()/((RAND_MAX + 1u)/6);  // Note: 1+rand()%6 is biased
    
            results[x-1]++;
        }
    
        for (int n=0; n !=6; n++) {
            std::cout << results[n] << ' ';
        }
    
        std::cout << "\n";
    
    
        // Roll the die 6000000 times using the supposedly biased method and keep track of the results
    
        int results_bias[6] = {0,0,0,0,0,0};
    
        // roll a 6-sided die 20 times
        for (int n=0; n != 6000000; ++n) {
            int x = 7;
            while(x > 6) 
                x = 1 + std::rand()%6;
    
            results_bias[x-1]++;
        }
    
        for (int n=0; n !=6; n++) {
            std::cout << results_bias[n] << ' ';
        }
    }
    

    I then took the output of this and used the chisq.test function in R to run a Chi-square test to see if the results are significantly different than expected. This stackexchange question goes into more detail of using the chi-square test to test die fairness: How can I test whether a die is fair?. Here are the results for a few runs:

    > ?chisq.test
    > unbias <- c(100150, 99658, 100319, 99342, 100418, 100113)
    > bias <- c(100049, 100040, 100091, 99966, 100188, 99666 )
    
    > chisq.test(unbias)
    
    Chi-squared test for given probabilities
    
    data:  unbias
    X-squared = 8.6168, df = 5, p-value = 0.1254
    
    > chisq.test(bias)
    
    Chi-squared test for given probabilities
    
    data:  bias
    X-squared = 1.6034, df = 5, p-value = 0.9008
    
    > unbias <- c(998630, 1001188, 998932, 1001048, 1000968, 999234 )
    > bias <- c(1000071, 1000910, 999078, 1000080, 998786, 1001075   )
    > chisq.test(unbias)
    
    Chi-squared test for given probabilities
    
    data:  unbias
    X-squared = 7.051, df = 5, p-value = 0.2169
    
    > chisq.test(bias)
    
    Chi-squared test for given probabilities
    
    data:  bias
    X-squared = 4.319, df = 5, p-value = 0.5045
    
    > unbias <- c(998630, 999010, 1000736, 999142, 1000631, 1001851)
    > bias <- c(999803, 998651, 1000639, 1000735, 1000064,1000108)
    > chisq.test(unbias)
    
    Chi-squared test for given probabilities
    
    data:  unbias
    X-squared = 7.9592, df = 5, p-value = 0.1585
    
    > chisq.test(bias)
    
    Chi-squared test for given probabilities
    
    data:  bias
    X-squared = 2.8229, df = 5, p-value = 0.7273
    

    In the three runs that I did, the p-value for both methods was always greater than typical alpha values used to test significance (0.05). This means that we wouldn't consider either of them to be biased. Interestingly, the supposedly unbiased method has consistently lower p-values, which indicates that it might actually be more biased. The caveat being that I only did 3 runs.

    UPDATE: While I was writing my answer, Konrad Rudolph posted an answer that takes the same approach, but gets a very different result. I don't have the reputation to comment on his answer, so I'm going to address it here. First, the main thing is that the code he uses uses the same seed for the random number generator every time it's run. If you change the seed, you actually get a variety of results. Second, if you don't change the seed, but change the number of trials, you also get a variety of results. Try increasing or decreasing by an order of magnitude to see what I mean. Third, there is some integer truncation or rounding going on where the expected values aren't quite accurate. It probably isn't enough to make a difference, but it's there.

    Basically, in summary, he just happened to get the right seed and number of trials that he might be getting a false result.

提交回复
热议问题