I am trying to test the likelihood that a particular clustering of data has occurred by chance. A robust way to do this is Monte Carlo simulation, in which the associations
Here's some code for sampling without replacement based on Algorithm 3.4.2S of Knuth's book Seminumeric Algorithms.
void SampleWithoutReplacement
(
int populationSize, // size of set sampling from
int sampleSize, // size of each sample
vector & samples // output, zero-offset indicies to selected items
)
{
// Use Knuth's variable names
int& n = sampleSize;
int& N = populationSize;
int t = 0; // total input records dealt with
int m = 0; // number of items selected so far
double u;
while (m < n)
{
u = GetUniform(); // call a uniform(0,1) random number generator
if ( (N - t)*u >= n - m )
{
t++;
}
else
{
samples[m] = t;
t++; m++;
}
}
}
There is a more efficient but more complex method by Jeffrey Scott Vitter in "An Efficient Algorithm for Sequential Random Sampling," ACM Transactions on Mathematical Software, 13(1), March 1987, 58-67.