I\'m working on porting a MATLAB simulation into C++. To do this, I am trying to replicate MATLAB\'s randsample() function. I haven\'t figured out an efficient way to do thi
So this was a solution I came up with that will generate the samples in a random order, rather than in a deterministic manner that would need to be shuffled later:
vector GenerateRandomSample(int range, int samples) {
vector solution; // Populated in the order that the numbers are generated in.
vector to_exclude; // Inserted into in sorted order.
for(int i = 0; i < samples; ++i) {
auto raw_rand = rand() % (range - to_exclude.size());
// This part can be optimized as a binary search
int offset = 0;
while(offset < to_exclude.size() &&
(raw_rand+offset) >= to_exclude[offset]) {
++offset;
}
// Alternatively substitute Binary Search to avoid linearly
// searching for where to put the new element. Arguably not
// actually a benefit.
// int offset = ModifiedBinarySearch(to_exclude, raw_rand);
int to_insert = (raw_rand + offset);
to_exclude.insert(to_exclude.begin() + offset, to_insert);
solution.push_back(to_insert);
}
return solution;
}
I added an optional binary search for the location on where to insert the newly generated random member, but after attempting to benchmark its execution over large ranges(N)/and sets (K) (done on codeinterview.io/), I have not found any significant benefit to doing so, over just linearly traversing and exiting early.
EDIT: After further extensive testing, I've found over a sufficiently large parameters: (eg. N = 1000, K = 500, TRIALS = 10000) The binary search method does in fact offer a considerable improvement: for the given parameters: with binary search: ~2.7 seconds with linear: ~5.1 seconds deterministic (without shuffle as proposed by Barry in the accepted answer based on Robert Floyd): ~3.8 seconds
int ModifiedBinarySearch(const vector& collection, int raw_rand) {
int offset = 0;
int beg = 0, end = collection.size() - 1;
bool upper_range = 0;
while (beg <= end) {
offset = (beg + end) / 2;
auto to_search_for = (raw_rand+offset);
auto left = collection[offset];
auto right = (offset+1 < collection.size() ?
collection[offset+1] :
collection[collection.size() - 1]);
if ((raw_rand+offset) < left) {
upper_range = false;
end = offset - 1;
} else if ((raw_rand+offset+1) >= right) {
upper_range = true;
beg = offset + 1;
} else {
upper_range = true;
break;
}
}
offset = ((beg + end) / 2) + (upper_range ? 1 : 0);
return offset;
}