Say I have y distinct values and I want to select x of them at random. What\'s an efficient algorithm for doing this? I could just call rand(
The trick is to use a variation of shuffle or in other words a partial shuffle.
function random_pick( a, n )
{
N = len(a);
n = min(n, N);
picked = array_fill(0, n, 0); backup = array_fill(0, n, 0);
// partially shuffle the array, and generate unbiased selection simultaneously
// this is a variation on fisher-yates-knuth shuffle
for (i=0; i=0; i--) // O(n) times
{
selected = backup[ i ];
value = a[ N ];
a[ N ] = a[ selected ];
a[ selected ] = value;
N++;
}
return picked;
}
NOTE the algorithm is strictly O(n) in both time and space, produces unbiased selections (it is a partial unbiased shuffling) and non-destructive on the input array (as a partial shuffle would be) but this is optional
adapted from here
update
another approach using only a single call to PRNG (pseudo-random number generator) in [0,1] by IVAN STOJMENOVIC, "ON RANDOM AND ADAPTIVE PARALLEL GENERATION OF COMBINATORIAL OBJECTS" (section 3), of O(N) (worst-case) complexity