Sample without replacement in Java with probabilities

给你一囗甜甜゛ 提交于 2020-05-01 12:13:15

问题


I have a list of 10 probabilities (assume these are sorted in descending order): <p1, p2, ..., p10>. I want to sample (without replacement) 10 elements such that the probability of selecting i-th index is p_i.

Is there a ready to use Java method in common libraries like Random, etc that I could use to do that?

Example: 5-element list: <0.4,0.3,0.2,0.1,0.0>

Select 5 indexes (no duplicates) such that their probability of selection is given by the probability at that index in the list above. So index 0 would be selected with probability 0.4, index 1 selected with prob 0.3 and so on.

I have written my own method to do that but feel that an existing method would be better to use. If you are aware of such a method, please let me know.


回答1:


This is how this is typically done:

    static int sample(double[] pdf) {
        // Transform your probabilities into a cumulative distribution
        double[] cdf = new double[pdf.length];
        cdf[0] = pdf[0];
        for(int i = 1; i < pdf.length; i++)
            cdf[i] += pdf[i] + cdf[i-1];
        // Let r be a probability [0,1]
        double r = Math.random();
        // Search the bin corresponding to that quantile
        int k = Arrays.binarySearch(cdf, random.nextDouble());
        k = k >= 0 ? k : (-k-1);
        return k;
    }

If you want to return a probability do:

    return pdf[k];

EDIT: I just noticed you say in the title sampling without replacement. This is not so trivial to do fast (I can give you some code I have for that). Anyhow, your question does not make any sense in that case. You cannot sample without replacement from a probability distribution. You need absolute frequencies.

i.e. If I tell you that I have a box filled with two balls: orange and blue with the proportions 20% and 80%. If you do not tell me how many balls you have of each (in absolute terms), I cannot tell you how many balls you will have in a few turns.

EDIT2: A faster version. This is not how it is typically but I have found this suggestion on the web, and I have used it in projects of mine as well.

    static int sample(double[] pdf) {
        double r = random.nextDouble();
        for(int i = 0; i < pdf.length; i++) {
            if(r < pdf[i])
                return i;
            r -= pdf[i];
        }
        return pdf.length-1;  // should not happen
    }

To test this:

// javac Test.java && java Test

import java.util.Arrays;
import java.util.Random;

class Test
{
    static Random random = new Random();

    public static void sample(double[] pdf) {
        ...
    }

    public static void main(String[] args) {
        double[] pdf = new double[] { 0.3, 0.4, 0.2, 0.1 };
        int[] counts = new int[pdf.length];
        final int tests = 1000000;
        for(int i = 0; i < tests; i++)
            counts[sample(pdf)]++;
        for(int i = 0; i < counts.length; i++)
            System.out.println(counts[i] / (double)tests);
    }
}

You can see we get output very similar to the PDF that was used:

0.3001356
0.399643
0.2001143
0.1001071

This are the times I get when running each version:

  • 1st version: 0m0.680s
  • 2nd version: 0m0.296s


来源:https://stackoverflow.com/questions/29480842/sample-without-replacement-in-java-with-probabilities

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!