R: Generate data from a probability density distribution

房东的猫 提交于 2019-12-09 04:55:05

问题


Say I have a simple array, with a corresponding probability distribution.

library(stats)    
data <- c(0,0.08,0.15,0.28,0.90)
pdf_of_data <- density(data, from= 0, to=1, bw=0.1)

Is there a way I could generate another set of data using the same distribution. As the operation is probabilistic, it need not exactly match the initial distribution anymore, but will be just generated from it.

I did have success finding a simple solution on my own. Thanks!


回答1:


Your best bet is to generate the empirical cumulative density function, approximate the inverse, and then transform the input.

The compound expression looks like

random.points <- approx(
  cumsum(pdf_of_data$y)/sum(pdf_of_data$y),
  pdf_of_data$x,
  runif(10000)
)$y

Yields

hist(random.points, 100)




回答2:


From the examples in the documentation of ?density you (almost) get the answer.

So, something like this should do it:

library("stats")    
data <- c(0,0.08,0.15,0.28,0.90)
pdf_of_data <- density(data, from= 0, to=1, bw=0.1)

# From the example.
N <- 1e6
x.new <- rnorm(N, sample(data, size = N, replace = TRUE), pdf_of_data$bw)

# Histogram of the draws with the distribution superimposed.
hist(x.new, freq = FALSE)
lines(pdf_of_data)

You can just reject the draws outside your interval as in rejection sampling. Alternatively, you can use the algorithm described in the link.




回答3:


To draw from the curve:

sample(pdf_of_data$x, 1e6, TRUE, pdf_of_data$y)


来源:https://stackoverflow.com/questions/32871602/r-generate-data-from-a-probability-density-distribution

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!