Simulate from kernel density estimator with variable underlying grid

亡梦爱人 提交于 2021-01-28 05:51:02

问题


I have a dataset that I'm using to create an empirical probability distribution by estimating a kernel density. Right now I'm using R's kde2d from the MASS package. After estimating the probability distribution, I use sample to sample from slices of the 2D distribution along the x-axis. I use sample much like described here. Example code would look like this

library(MASS)
set.seed(123)
x = rnorm(100, 1, 0.1)
set.seed(456)
y = rnorm(100, 1, 0.5)
den <- kde2d(x, y, n = 50, lims = c(-2, 2, -2, 2))
#to plot this 2d kde:
#library(lattice)
#persp(den)
conditional_probabilty_density = list(x = den$y, y = den$z[40, ])
#to plot the slice:
#plot(conditional_probabilty_density)
simulated_sample = sample(conditional_probabilty_density$x, size = 10, replace = TRUE, prob = conditional_probabilty_density$y)

The den looks like this

My data has known areas where there is a lot of fluctuations, requiring a fine grid granularity. Other areas have basically no data points and nothing is going on there. I would be fine if I could just set the n parameter of kde2d to a very high number in order to have a good resolution of my data everywhere. Alas, this is not possible due to memory constraints.

That's why I thought I could modify the kde2d function to have a non-constant granularity.
Here is the source code of the kde2d function. One can modify the line

gy <- seq.int(lims[3L], lims[4L], length.out = n[2L])

and put whatever granularity is wished for on the y-axis. For example

a <- seq(-1, 0, 0.5)
gy <- c(a, seq.int(0.1, 2, length.out = n[2L]-length(a)))

And the modified kde2d returns the kernel density estimate at the specified positions. Works very well. Suppose I have now

Problem is, I can no longer use sample to sample from slices along the x-axis. Because the part on the left side of the distribution is much finer and thus has a higher probability to be sampled by sample.

What can I do to have a fine grid where I need it, but sample from the distribution according to its proper densities? Thank you a lot.


回答1:


Use approx on conditional_probabilty_density with a new n.



来源:https://stackoverflow.com/questions/47399038/simulate-from-kernel-density-estimator-with-variable-underlying-grid

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!