Most efficient way of randomly choosing a set of distinct integers

后端 未结 8 722
囚心锁ツ
囚心锁ツ 2020-12-01 07:04

I\'m looking for the most efficient algorithm to randomly choose a set of n distinct integers, where all the integers are in some range [0..maxValue].

Constraints:<

8条回答
  •  孤城傲影
    2020-12-01 07:30

    UPDATE: I am wrong. The output of this is not uniformly distributed. Details on why are here.


    I think this algorithm below is optimum. I.e. you cannot get better performance than this.

    For choosing n numbers out of m numbers, the best offered algorithm so far is presented below. Its worst run time complexity is O(n), and needs only a single array to store the original numbers. It partially shuffles the first n elements from the original array, and then you pick those first n shuffled numbers as your solution.

    This is also a fully working C program. What you find is:

    • Function getrand: This is just a PRNG that returns a number from 0 up to upto.
    • Function randselect: This is the function that randmoly chooses n unique numbers out of m many numbers. This is what this question is about.
    • Function main: This is only to demonstrate a use for other functions, so that you could compile it into a program and have fun.
    #include 
    #include 
    
    int getrand(int upto) {
        long int r;
        do {
            r = rand();
        } while (r > upto);
        return r;
    }
    
    void randselect(int *all, int end, int select) {
        int upto = RAND_MAX - (RAND_MAX % end);
        int binwidth = upto / end;
    
        int c;
        for (c = 0; c < select; c++) {
            /* randomly choose some bin */
            int bin = getrand(upto)/binwidth;
    
            /* swap c with bin */
            int tmp = all[c];
            all[c] = all[bin];
            all[bin] = tmp;
        }
    }
    
    int main() {
        int end = 1000;
        int select = 5;
    
        /* initialize all numbers up to end */
        int *all = malloc(end * sizeof(int));
        int c;
        for (c = 0; c < end; c++) {
            all[c] = c;
        }
    
        /* select select unique numbers randomly */
        srand(0);
        randselect(all, end, select);
        for (c = 0; c < select; c++) printf("%d ", all[c]);
        putchar('\n');
    
        return 0;
    }
    

    Here is the output of an example code where I randomly output 4 permutations out of a pool of 8 numbers for 100,000,000 many times. Then I use those many permutations to compute the probability of having each unique permutation occur. I then sort them by this probability. You notice that the numbers are fairly close, which I think means that it is uniformly distributed. The theoretical probability should be 1/1680 = 0.000595238095238095. Note how the empirical test is close to the theoretical one.

提交回复
热议问题