Fast weighted random selection from very large set of values

橙三吉。 提交于 2019-11-30 02:00:35

Assuming that the element weights are fixed, you can work with precomputed sums. This is like working with the cumulative probability function directly, rather than the density function.

The lookup can then be implemented as a binary search, and hence be log(N) in the number of elements.

A binary search obviously requires random_access to the container of the weights.

Alternatively, use a std::map<> and the upper_bound() method.

#include <iostream>
#include <map>
#include <stdlib.h>

int main ()
{
  std::map<double, char> cumulative;
  typedef std::map<double, char>::iterator It;

  cumulative[.20]='a';
  cumulative[.30]='b';
  cumulative[.40]='c';
  cumulative[.80]='d';
  cumulative[1.00]='e';

  const int numTests = 10;
  for(int i = 0;
      i != numTests;
      ++i)
  {
      double linear = rand()*1.0/RAND_MAX;  
      std::cout << linear << "\t" << cumulative.upper_bound(linear)->second << std::endl;
  }

  return 0;
}

You want to use the Walker algorithm. With N elements, there's a set-up cost of O(N). However, the sampling cost is O(1). See

  • A. J. Walker, An Efficient Method for Generating Discrete Random Variables and General Distributions, ACM TOMS 3, 253-256 (1977).
  • Knuth, TAOCP, Vol 2, Sec 3.4.1.A.

The RandomSelect class of a RandomLib implements this algorithm.

If you have a quick enough way to sample a random element uniformly, you can use rejection sampling; all you need to know is the maximum weight. It would work as follows: Suppose the maximum weight is M. Pick a number X uniformly in [0,1]. Sample elements repeatedly until you find one whose weight is at least M*X; choose this one.

Or, an approximate solution: pick 100 elements uniformly at random; choose one proportional to weight within this set.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!