Random sampling from a dataset, while preserving original probability distribution

丶灬走出姿态 提交于 2019-12-07 03:30:53

问题


I have a set of >2000 numbers, gathered from measurement. I want to sample from this data set, ~10 times in each test, while preserving probability distribution overall, and in each test (to extent approximately possible). For example, in each test, I want some small value, some middle class value, some big value, with the mean and variance approximately close to the original distribution. Combining all the tests, I also want the total mean and variance of all the samples, approximately close to the original distribution.

As my dataset is a long-tail probability distribution, the amount of data at each quantile are not the same:

Fig 1. Density plot of ~2k elements of data.

I am using Java, and right now I am using a uniform distribution, and use a random int from the dataset, and return the data element at that position:

public int getRandomData() {
    int data[] ={1231,414,222,4211,,41,203,123,432,...};
    length=data.length;
    Random r=new Random();
    int randomInt = r.nextInt(length);
    return data[randomInt];
}

I don't know if it works as I want, because I use data in order it is measured, which has great amount of serial correlation.


回答1:


It works as you want. The order of the data is irrelevant.




回答2:


Random sampling preserves the probability distribution.



来源:https://stackoverflow.com/questions/32539767/random-sampling-from-a-dataset-while-preserving-original-probability-distributi

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!