C++ randomly sample k numbers from range 0:n-1 (n > k) without replacement

后端未结

关注

 5  1440

死守一世寂寞 2021-01-02 13:11

I\'m working on porting a MATLAB simulation into C++. To do this, I am trying to replicate MATLAB\'s randsample() function. I haven\'t figured out an efficient way to do thi

5条回答

心在旅途 (楼主)

2021-01-02 13:49

Here's an approach that doesn't require generating and shuffling a huge list, in case N is huge but k is not:

std::vector pick(int N, int k) {
    std::random_device rd;
    std::mt19937 gen(rd());

    std::unordered_set elems = pickSet(N, k, gen);

    // ok, now we have a set of k elements. but now
    // it's in a [unknown] deterministic order.
    // so we have to shuffle it:

    std::vector result(elems.begin(), elems.end());
    std::shuffle(result.begin(), result.end(), gen);
    return result;
}

Now the naive approach of implementing pickSet is:

std::unordered_set pickSet(int N, int k, std::mt19937& gen)
{
    std::uniform_int_distribution<> dis(1, N);
    std::unordered_set elems;

    while (elems.size() < k) {
        elems.insert(dis(gen));
    }

    return elems;
}

But if k is large relative to N, this algorithm could lead to lots of collisions and could be pretty slow. We can do better by guaranteeing that we can add one element on each insertion (brought to you by Robert Floyd):

std::unordered_set pickSet(int N, int k, std::mt19937& gen)
{
    std::unordered_set elems;
    for (int r = N - k; r < N; ++r) {
        int v = std::uniform_int_distribution<>(1, r)(gen);

        // there are two cases.
        // v is not in candidates ==> add it
        // v is in candidates ==> well, r is definitely not, because
        // this is the first iteration in the loop that we could've
        // picked something that big.

        if (!elems.insert(v).second) {
            elems.insert(r);
        }   
    }
    return elems;
}

0 讨论(0)

查看其它5个回答