Generate Random Weighted value

前端 未结 6 1462
隐瞒了意图╮
隐瞒了意图╮ 2020-12-28 10:19

Edit: I\'ve rewritten the question in hopes that the goal is a little clearer.

This is an extended question to this question here, and I really like

6条回答
  •  南笙
    南笙 (楼主)
    2020-12-28 10:38

    Since I'm stuck at home today because of the flu :( I decided to try and figure this out for you. Essentially what you're asking for is some sort of interpolation. I used the easiest (linear) and these are my results and code. The code is kind of messy and I may fix it in the upcoming days..

     $b)
            $decreasing = true;
        else
            $decreasing = false;
        $final = array();
        for ($i = 1; $i <= $steps-1; ++$i) {
            if ($decreasing)
                $final[$i+$k] = $a-=$per_step; // linear interpolation
            else
                $final[$i+$k] = $a+=$per_step; // linear interpolation
        }
        return $final;
    }
    
    // this function combines probability arrays after the interpolation occurs
    // this may happen multiple times, think about 1, 3, 5. interpolation would have to occur
    // from 1 -> 2 -> 3, and from 3 -> 4 -> 5.
    function interpolateProbabilities ($nodes) {
        $pNodes = array();
        $pNodes = $nodes;
        $keys = array_keys($nodes);
        for ($i = 0; $i < count($keys); $i++) {
            if ($keys[$i+1] - $keys[$i] != 1) {
                $pNodes += interpolate($nodes[$keys[$i]], $nodes[$keys[$i+1]], $keys[$i+1] - $keys[$i], $keys[$i]);
            }
        }
        ksort($pNodes);
        return $pNodes;
    }
    
    // this generates a weighed random value and is pretty much copy-pasted from:
    // http://w-shadow.com/blog/2008/12/10/fast-weighted-random-choice-in-php/
    // it's robust and re-writing it would be somewhat pointless
    function generateWeighedRandomValue($nodes) {
        $weights = array_values($nodes);
        $values = array_keys($nodes);
        $count = count($values);
        $i = 0;
        $n = 0;
        $num = mt_rand(0, array_sum($weights));
        while($i < $count) {
            $n += $weights[$i];
            if($n >= $num) {
                break;
               }
            $i++;
           }
        return $values[$i];
    }
    
    // two test cases
    $nodes = array( 1 => 12, 5 => 22, 9 => 31, 10 => 35); // test 1
    $nodes = array( 1 => 22, 3 => 50, 6 => 2, 7 => 16, 10 => 10); // test 2
    $export = array();
    
    // run it 1000 times
    for ($i = 0; $i < 1000; ++$i) {
        $export[generateWeighedRandomValue(interpolateProbabilities($nodes))]++;
    }
    
    // for copy-pasting into excel to test out distribution
    print_r($export);
    
    ?>
    

    The results are, I think, exactly what you're looking for. In the case of:

    $nodes = array( 1 => 12, 5 => 22, 9 => 31, 10 => 35); // test 1
    

    I got the following (final) array:

    Array
    (
        [5] => 92
        [7] => 94
        [10] => 162
        [8] => 140
        [3] => 71
        [6] => 114
        [2] => 75
        [4] => 69
        [9] => 131
        [1] => 52
    )
    

    Namely, 1 should happen 12% of the time, 5 22%, 9 31%, and 10 35% of the time. Lets graph it: graph 1

    It looks promising, but lets try something crazier...

    $nodes = array( 1 => 22, 3 => 50, 6 => 2, 7 => 16, 10 => 10); // test 2
    

    In this case, 3 should occur 50% of the time, and steeply decrease into 6. Lets see what happens! This is the array (in retrospect, I should have sorted these arrays):

    Array
    (
        [4] => 163
        [7] => 64
        [2] => 180
        [10] => 47
        [1] => 115
        [5] => 81
        [3] => 227
        [8] => 57
        [6] => 6
        [9] => 60
    )
    

    And lets look at the picture:

    alt text

    It looks like it works :)

    I hope I was able to solve your problem (or at least point you in the right direction). Note that my code currently has a number of stipulations. Namely, the initial nodes you provide MUST have probabilities that add up to 100% or you may get some wonky behavior.

    Also, the code is kind of messy but the concepts are relatively simple. Some other cool stuff would be to try and instead of using linear interpolation, use some other kind, which would give you more interesting results!


    Algorithm

    To avoid confusion I'll just show exactly how the algorithm works. I give PHP a $node array that's in the form of integer => frequency in percentage and ends up looking something like array( 1 => 22, 3 => 50, 6 => 2, 7 => 16, 10 => 10), which is test 2 from above.

    Test 2 basically says that you want 5 control nodes placed at 1, 3, 6, 7, and 10 with the frequencies of 22%, 50%, 2%, 16%, and 10% respectively. First, I need to see exactly where I need to do the interpolation. For example, I don't need to do it between 6 and 7, but I do need to do it between 1 and 3 (we need to interpolate 2) and 7 and 10 (where we need to interpolate 8 and 9).

    The interpolation between 1 -> 3 has (3 - 1) - 1 = 1 steps and should be inserted at key[2] in the original array. The value (%) for the 1 -> 3 interpolation is abs($a - $b) / $steps which translates to the absolute value of the % of 1 minus the % of 2, divided by steps + 1 which, in our case, happens to equal 14. We need to see if the function is increasing or decreasing (hello Calculus). If the function is increasing we keep adding the step % to the new interpolation array until we filled all of our empty spots (if the function is decreasing, we subtract the step % value. Since we only need to fill one spot, we return 2 => 36 (22 + 14 = 36).

    We combine the arrays and the result is (1 => 22, 2 => 36, 3 => 50, 6 => 2, 7 => 16, 10 => 10). The program interpolated 2, which was a percent value that we didn't explicitly declare.

    In the case of 7 -> 10, there are 2 steps, the step percentage is 2 which comes from (16-10) / (3 + 1) = 2. The function is decreasing, so we need to subtract 2 repeatedly. The final interpolated array is (8 => 14, 9 => 12). We combine all of the arrays and voila.

    The following image shows the green (initial values) and the red (interpolated values). You may have to "view image" to see the whole thing clearly. You'll notice that I use ± since the algorithm needs to figure out if we're supposed to be increasing or decreasing over a certain period.

    alt text


    This code should probably be written in a more OOP paradigm. I play a lot with array keys (for example, I need to pass $k so it's easier to combine arrays once I return them from interpolate($a, $b, $steps, $k) because they automatically have the right keys. This is just a PHP idiosyncrasy and in retrospect, I should have probably went with a more readable OOP approach to begin with.


    This is my last edit, I promise :) Since I love playing with Excel, this shows how the percentages normalize once the numbers are interpolated. This is important to see, especially considering that in your first picture, what you're showing is somewhat of a mathematical impossibility.

    Test 1 alt text Test 2 alt text

    You'll notice that the percentages dampen significantly to accommodate the interpolation. Your second graph in reality would look more like this:

    alt text

    In this graph, I weighed 1 = > 1, 5 => 98, 10 => 1 and you see the extremes of the dampening effect. After all, percentages, by definition have to add up to 100! It's just important to realize that the dampening effect is directly proportional to the number of steps between extremes.

提交回复
热议问题