random variable from skewed distribution with scipy

好久不见. 提交于 2020-01-14 14:45:10

问题


trying to draw a random number from a distribution in SciPy, just like you would with stats.norm.rvs. However, I'm trying to take the number from an empirical distribution I have - it's a skewed dataset and I want to incorporate the skew and kurtosis into the distribution that I'm drawing from. Ideally I'd like to just call stats.norm.rvs(loc=blah,scale=blah,size=blah) and then also set the skew and kurt in addition to the mean and variance. The norm function takes a 'moments' argument consisting of some arrangement of 'mvsk' where the s and k stand for skew and kurtosis, but apparently all that does is ask that the s and k be computed from the rv, whereas I want to establish the s and k as parameters of the distribution to begin with.

Anyway, I'm not a statistics expert by any means, perhaps this is a simple or misguided question. Would appreciate any help.

EDIT: If the four moments aren't enough to define the distribution well enough, is there any other way to draw values that are consist with an empirical distribution that looks like this: http://i.imgur.com/3yB2Y.png


回答1:


If you are not worried about getting out into the tails of the distribution, and the data are floating point, then you can sample from the empirical distribution.

  • Sort the the data.
  • Pre-pend a 0 to the data.
  • Let N denote the length of this data_array
  • Compute q=scipy.rand()*N
  • idx=int(q); di=q-idx
  • xlo=data_array[idx], xhi=data_array[idx+1];
  • return xlo+(xhi-xlo)*di

Basically, this is linearly interpolating in the empirical CDF to obtain the random variates.

The two potential problems are (1) if your data set is small, you may not represent the distribution well, and (2) you will not generate a value larger than the largest one in your existing data set.

To get beyond those you need to look at parametric distributions, like the gamma distribution mentioned above.




回答2:


The normal distribution has only 2 parameters, mean and variance. There are extensions of the normal distribution that have 4 parameters, with skew and kurtosis additional. One example would be Gram-Charlier expansion, but as far as I remember only the pdf is available in scipy, not the rvs.

As alternative there are distributions in scipy.stats that have 4 parameters like johnsonsu which are flexible but have a different parameterization.

However, in your example, the distribution is for values larger than zero, so an approximately normal distribution wouldn't work very well. As Andrew suggested, I think you should look through the distributions in scipy.stats that have a lower bound of zero, like the gamma, and you might find something close.

Another alternative, if your sample is large enough, would be to use gaussian_kde, which can also create random numbers. But gaussian_kde is also not designed for distribution with a finite bound.




回答3:


Maybe I've misunderstood, I'm certainly not a stats expert, but your image looks quite a bit like a gamma distribution.

Scipy contains a code specifically for gamma distributions - http://www.scipy.org/doc/api_docs/SciPy.stats.distributions.html#gamma




回答4:


short answer replace with other distribution if needed:

n = 100
a_b = [rand() for i in range(n)]
a_b.sort()
# len(a_b[:int(n*.8)])
c = a_b[int(n*.8)]
print c


来源:https://stackoverflow.com/questions/9858290/random-variable-from-skewed-distribution-with-scipy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!