why does this simple shuffle algorithm produce biased results? what is a simple reason?

前端未结

关注

 12  1467

旧时难觅i 2020-11-27 03:17

it seems that this simple shuffle algorithm will produce biased results:

# suppose $arr is filled with 1 to 52

for ($i < 0; $i < 52; $i++) { 
  $j = r


      
      
        
          12条回答        

        
                    
            
            
                         
                
              
              
                
                   迷失自我
                                             
                
                
                (楼主)
            
              
              
                2020-11-27 03:46
              

            
            
                        
Not that another answer is needed, but I found it worthwhile to try to work out exactly why Fisher-Yates is uniform.

If we are talking about a deck with N items, then this question is: how can we show that 

Pr(Item i ends up in slot j) = 1/N?


Breaking it down with conditional probabilities, Pr(item i ends up at slot j) is equal to 

Pr(item i ends up at slot j | item i was not chosen in the first j-1 draws)
* Pr(item i was not chosen in the first j-1 draws).


and from there it expands recursively back to the first draw.

Now, the probability that element i was not drawn on the first draw is N-1 / N. And the probability that it was not drawn on the second draw conditional on the fact that it was not drawn on the first draw is N-2 / N-1 and so on.

So, we get for the probability that element i was not drawn in the first j-1 draws:

(N-1 / N) * (N-2 / N-1) * ... * (N-j / N-j+1)


and of course we know that the probability that it is drawn at round j conditional on not having been drawn earlier is just 1 / N-j.

Notice that in the first term, the numerators all cancel the subsequent denominators (i.e. N-1 cancels, N-2 cancels, all the way to N-j+1 cancels, leaving just N-j / N).

So the overall probability of element i appearing in slot j is:

[(N-1 / N) * (N-2 / N-1) * ... * (N-j / N-j+1)] * (1 / N-j)
= 1/N


as expected.

To get more general about the "simple shuffle", the particular property that it is lacking is called exchangeability. Because of the "path dependence" of the way the shuffle is created (i.e. which of the 27 paths is followed to create the output), you are not able to treat the different component-wise random variables as though they can appear in any order. In fact, this is perhaps the motivating example for why exchangeability matters in random sampling.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它12个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复