I am trying to test the likelihood that a particular clustering of data has occurred by chance. A robust way to do this is Monte Carlo simulation, in which the associations
Another algorithm for sampling without replacement is described here.
It is similar to the one described by John D. Cook in his answer and also from Knuth, but it has different hypothesis: The population size is unknown, but the sample can fit in memory. This one is called "Knuth's algorithm S".
Quoting the rosettacode article:
- Select the first n items as the sample as they become available;
- For the i-th item where i > n, have a random chance of n/i of keeping it. If failing this chance, the sample remains the same. If not, have it randomly (1/n) replace one of the previously selected n items of the sample.
- Repeat #2 for any subsequent items.