I\'m looking for the most efficient algorithm to randomly choose a set of n distinct integers, where all the integers are in some range [0..maxValue].
Constraints:<
UPDATE: I am wrong. The output of this is not uniformly distributed. Details on why are here.
I think this algorithm below is optimum. I.e. you cannot get better performance than this.
For choosing n numbers out of m numbers, the best offered algorithm so far is presented below. Its worst run time complexity is O(n), and needs only a single array to store the original numbers. It partially shuffles the first n elements from the original array, and then you pick those first n shuffled numbers as your solution.
This is also a fully working C program. What you find is:
getrand: This is just a PRNG that returns a number from 0 up to upto.randselect: This is the function that randmoly chooses n unique numbers out of m many numbers. This is what this question is about.main: This is only to demonstrate a use for other functions, so that you could compile it into a program and have fun.#include
#include
int getrand(int upto) {
long int r;
do {
r = rand();
} while (r > upto);
return r;
}
void randselect(int *all, int end, int select) {
int upto = RAND_MAX - (RAND_MAX % end);
int binwidth = upto / end;
int c;
for (c = 0; c < select; c++) {
/* randomly choose some bin */
int bin = getrand(upto)/binwidth;
/* swap c with bin */
int tmp = all[c];
all[c] = all[bin];
all[bin] = tmp;
}
}
int main() {
int end = 1000;
int select = 5;
/* initialize all numbers up to end */
int *all = malloc(end * sizeof(int));
int c;
for (c = 0; c < end; c++) {
all[c] = c;
}
/* select select unique numbers randomly */
srand(0);
randselect(all, end, select);
for (c = 0; c < select; c++) printf("%d ", all[c]);
putchar('\n');
return 0;
}
Here is the output of an example code where I randomly output 4 permutations out of a pool of 8 numbers for 100,000,000 many times. Then I use those many permutations to compute the probability of having each unique permutation occur. I then sort them by this probability. You notice that the numbers are fairly close, which I think means that it is uniformly distributed. The theoretical probability should be 1/1680 = 0.000595238095238095. Note how the empirical test is close to the theoretical one.