C++ randomly sample k numbers from range 0:n-1 (n > k) without replacement

后端未结

关注

 5  1441

死守一世寂寞 2021-01-02 13:11

I\'m working on porting a MATLAB simulation into C++. To do this, I am trying to replicate MATLAB\'s randsample() function. I haven\'t figured out an efficient way to do thi

5条回答

心在旅途 (楼主)

2021-01-02 13:53

So this was a solution I came up with that will generate the samples in a random order, rather than in a deterministic manner that would need to be shuffled later:

vector GenerateRandomSample(int range, int samples) {
  vector solution; // Populated in the order that the numbers are generated in.
  vector to_exclude; // Inserted into in sorted order.
  for(int i = 0; i < samples; ++i) {
    auto raw_rand = rand() % (range - to_exclude.size());
    // This part can be optimized as a binary search
    int offset = 0;
    while(offset < to_exclude.size() &&
        (raw_rand+offset) >= to_exclude[offset]) {
      ++offset;
    }
    // Alternatively substitute Binary Search to avoid linearly
    // searching for where to put the new element. Arguably not
    // actually a benefit.
    // int offset = ModifiedBinarySearch(to_exclude, raw_rand);

    int to_insert = (raw_rand + offset);
    to_exclude.insert(to_exclude.begin() + offset, to_insert);
    solution.push_back(to_insert);
  }  
  return solution;
}

I added an optional binary search for the location on where to insert the newly generated random member, but after attempting to benchmark its execution over large ranges(N)/and sets (K) (done on codeinterview.io/), I have not found any significant benefit to doing so, over just linearly traversing and exiting early.

EDIT: After further extensive testing, I've found over a sufficiently large parameters: (eg. N = 1000, K = 500, TRIALS = 10000) The binary search method does in fact offer a considerable improvement: for the given parameters: with binary search: ~2.7 seconds with linear: ~5.1 seconds deterministic (without shuffle as proposed by Barry in the accepted answer based on Robert Floyd): ~3.8 seconds

int ModifiedBinarySearch(const vector& collection, int raw_rand) {
  int offset = 0;
  int beg = 0, end = collection.size() - 1;
  bool upper_range = 0;
  while (beg <= end) {
    offset = (beg + end) / 2;
    auto to_search_for = (raw_rand+offset);
    auto left = collection[offset];
    auto right = (offset+1 < collection.size() ?
        collection[offset+1] :
        collection[collection.size() - 1]);
    if ((raw_rand+offset) < left) {
      upper_range = false;
      end = offset - 1;
    } else if ((raw_rand+offset+1) >= right) {
      upper_range = true;
      beg = offset + 1;
    } else {
      upper_range = true;
      break;
    }
  }
  offset = ((beg + end) / 2)  + (upper_range ? 1 : 0);
  return offset;
}

0 讨论(0)

查看其它5个回答