Why is rand()kiased?

后端未结

关注

 5  572

小鲜肉 2020-11-30 18:47

When reading how to use std::rand, I found this code on cppreference.com

int x = 7;
while(x > 6) 
    x = 1 + std::rand()/((RAND_MAX + 1u)/6);  // Note: 1


      
      
        
          5条回答        

        
                    
            
            
                         
                
              
              
                
                   无人及你
                                             
                
                
                (楼主)
            
              
              
                2020-11-30 19:38
              

            
            
                        
I'm not an experienced C++ user by any means, but was interested to see if the other answers regarding 
std::rand()/((RAND_MAX + 1u)/6) being less biased than 1+std::rand()%6 actually holds true. So I wrote a test program to tabulate the results for both methods (I haven't written C++ in ages, please check it). A link for running the code is found here. It's also reproduced as follows:

// Example program
#include 
#include 
#include 
#include 

int main()
{
    std::srand(std::time(nullptr)); // use current time as seed for random generator

    // Roll the die 6000000 times using the supposedly unbiased method and keep track of the results

    int results[6] = {0,0,0,0,0,0};

    // roll a 6-sided die 20 times
    for (int n=0; n != 6000000; ++n) {
        int x = 7;
        while(x > 6) 
            x = 1 + std::rand()/((RAND_MAX + 1u)/6);  // Note: 1+rand()%6 is biased

        results[x-1]++;
    }

    for (int n=0; n !=6; n++) {
        std::cout << results[n] << ' ';
    }

    std::cout << "\n";


    // Roll the die 6000000 times using the supposedly biased method and keep track of the results

    int results_bias[6] = {0,0,0,0,0,0};

    // roll a 6-sided die 20 times
    for (int n=0; n != 6000000; ++n) {
        int x = 7;
        while(x > 6) 
            x = 1 + std::rand()%6;

        results_bias[x-1]++;
    }

    for (int n=0; n !=6; n++) {
        std::cout << results_bias[n] << ' ';
    }
}


I then took the output of this and used the chisq.test function in R to run a Chi-square test to see if the results are significantly different than expected. This stackexchange question goes into more detail of using the chi-square test to test die fairness: How can I test whether a die is fair?. Here are the results for a few runs:

> ?chisq.test
> unbias <- c(100150, 99658, 100319, 99342, 100418, 100113)
> bias <- c(100049, 100040, 100091, 99966, 100188, 99666 )

> chisq.test(unbias)

Chi-squared test for given probabilities

data:  unbias
X-squared = 8.6168, df = 5, p-value = 0.1254

> chisq.test(bias)

Chi-squared test for given probabilities

data:  bias
X-squared = 1.6034, df = 5, p-value = 0.9008

> unbias <- c(998630, 1001188, 998932, 1001048, 1000968, 999234 )
> bias <- c(1000071, 1000910, 999078, 1000080, 998786, 1001075   )
> chisq.test(unbias)

Chi-squared test for given probabilities

data:  unbias
X-squared = 7.051, df = 5, p-value = 0.2169

> chisq.test(bias)

Chi-squared test for given probabilities

data:  bias
X-squared = 4.319, df = 5, p-value = 0.5045

> unbias <- c(998630, 999010, 1000736, 999142, 1000631, 1001851)
> bias <- c(999803, 998651, 1000639, 1000735, 1000064,1000108)
> chisq.test(unbias)

Chi-squared test for given probabilities

data:  unbias
X-squared = 7.9592, df = 5, p-value = 0.1585

> chisq.test(bias)

Chi-squared test for given probabilities

data:  bias
X-squared = 2.8229, df = 5, p-value = 0.7273


In the three runs that I did, the p-value for both methods was always greater than typical alpha values used to test significance (0.05). This means that we wouldn't consider either of them to be biased. Interestingly, the supposedly unbiased method has consistently lower p-values, which indicates that it might actually be more biased. The caveat being that I only did 3 runs.

UPDATE: While I was writing my answer, Konrad Rudolph posted an answer that takes the same approach, but gets a very different result. I don't have the reputation to comment on his answer, so I'm going to address it here. First, the main thing is that the code he uses uses the same seed for the random number generator every time it's run. If you change the seed, you actually get a variety of results. Second, if you don't change the seed, but change the number of trials, you also get a variety of results. Try increasing or decreasing by an order of magnitude to see what I mean. Third, there is some integer truncation or rounding going on where the expected values aren't quite accurate. It probably isn't enough to make a difference, but it's there.

Basically, in summary, he just happened to get the right seed and number of trials that he might be getting a false result.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它5个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复