it seems that this simple shuffle algorithm will produce biased results:
# suppose $arr is filled with 1 to 52
for ($i < 0; $i < 52; $i++) {
$j = r
Not that another answer is needed, but I found it worthwhile to try to work out exactly why Fisher-Yates is uniform.
If we are talking about a deck with N items, then this question is: how can we show that
Pr(Item i ends up in slot j) = 1/N?
Breaking it down with conditional probabilities, Pr(item i ends up at slot j)
is equal to
Pr(item i ends up at slot j | item i was not chosen in the first j-1 draws)
* Pr(item i was not chosen in the first j-1 draws).
and from there it expands recursively back to the first draw.
Now, the probability that element i
was not drawn on the first draw is N-1 / N
. And the probability that it was not drawn on the second draw conditional on the fact that it was not drawn on the first draw is N-2 / N-1
and so on.
So, we get for the probability that element i
was not drawn in the first j-1
draws:
(N-1 / N) * (N-2 / N-1) * ... * (N-j / N-j+1)
and of course we know that the probability that it is drawn at round j
conditional on not having been drawn earlier is just 1 / N-j
.
Notice that in the first term, the numerators all cancel the subsequent denominators (i.e. N-1
cancels, N-2
cancels, all the way to N-j+1
cancels, leaving just N-j / N
).
So the overall probability of element i
appearing in slot j
is:
[(N-1 / N) * (N-2 / N-1) * ... * (N-j / N-j+1)] * (1 / N-j)
= 1/N
as expected.
To get more general about the "simple shuffle", the particular property that it is lacking is called exchangeability. Because of the "path dependence" of the way the shuffle is created (i.e. which of the 27 paths is followed to create the output), you are not able to treat the different component-wise random variables as though they can appear in any order. In fact, this is perhaps the motivating example for why exchangeability matters in random sampling.