selection based on percentage weighting

后端 未结 13 2143
忘掉有多难
忘掉有多难 2020-12-04 13:40

I have a set of values, and an associated percentage for each:

a: 70% chance
b: 20% chance
c: 10% chance

I want to select a value (a, b, c) based

13条回答
  •  星月不相逢
    2020-12-04 14:09

    If you are really up to speed and want to generate the random values quickly, the Walker's algorithm mcdowella mentioned in https://stackoverflow.com/a/3655773/1212517 is pretty much the best way to go (O(1) time for random(), and O(N) time for preprocess()).

    For anyone who is interested, here is my own PHP implementation of the algorithm:

    /**
     * Pre-process the samples (Walker's alias method).
     * @param array key represents the sample, value is the weight
     */
    protected function preprocess($weights){
    
        $N = count($weights);
        $sum = array_sum($weights);
        $avg = $sum / (double)$N;
    
        //divide the array of weights to values smaller and geq than sum/N 
        $smaller = array_filter($weights, function($itm) use ($avg){ return $avg > $itm;}); $sN = count($smaller); 
        $greater_eq = array_filter($weights, function($itm) use ($avg){ return $avg <= $itm;}); $gN = count($greater_eq);
    
        $bin = array(); //bins
    
        //we want to fill N bins
        for($i = 0;$i<$N;$i++){
            //At first, decide for a first value in this bin
            //if there are small intervals left, we choose one
            if($sN > 0){  
                $choice1 = each($smaller); 
                unset($smaller[$choice1['key']]);
                $sN--;
            } else{  //otherwise, we split a large interval
                $choice1 = each($greater_eq); 
                unset($greater_eq[$choice1['key']]);
            }
    
            //splitting happens here - the unused part of interval is thrown back to the array
            if($choice1['value'] >= $avg){
                if($choice1['value'] - $avg >= $avg){
                    $greater_eq[$choice1['key']] = $choice1['value'] - $avg;
                }else if($choice1['value'] - $avg > 0){
                    $smaller[$choice1['key']] = $choice1['value'] - $avg;
                    $sN++;
                }
                //this bin comprises of only one value
                $bin[] = array(1=>$choice1['key'], 2=>null, 'p1'=>1, 'p2'=>0);
            }else{
                //make the second choice for the current bin
                $choice2 = each($greater_eq);
                unset($greater_eq[$choice2['key']]);
    
                //splitting on the second interval
                if($choice2['value'] - $avg + $choice1['value'] >= $avg){
                    $greater_eq[$choice2['key']] = $choice2['value'] - $avg + $choice1['value'];
                }else{
                    $smaller[$choice2['key']] = $choice2['value'] - $avg + $choice1['value'];
                    $sN++;
                }
    
                //this bin comprises of two values
                $choice2['value'] = $avg - $choice1['value'];
                $bin[] = array(1=>$choice1['key'], 2=>$choice2['key'],
                               'p1'=>$choice1['value'] / $avg, 
                               'p2'=>$choice2['value'] / $avg);
            }
        }
    
        $this->bins = $bin;
    }
    
    /**
     * Choose a random sample according to the weights.
     */
    public function random(){
        $bin = $this->bins[array_rand($this->bins)];
        $randValue = (lcg_value() < $bin['p1'])?$bin[1]:$bin[2];        
    }
    

提交回复
热议问题