问题
I am using the scipy.stats.gaussian_kde method from scipy to generate random samples from the data.
It works fine! What I have now found out is that the method also has inbuilt functions to calculate the probability density function of the given set of points (my data).
I would like to know how it calculates the pdf provided a set of points.
Here is small example:
import numpy as np
import scipy.stats
from scipy import stats
def getDistribution1(data):
kernel = stats.gaussian_kde(data,bw_method=0.06)
class rv(stats.rv_continuous):
def _rvs(self, *x, **y):
return kernel.resample(int(self._size)) #random variates
def _cdf(self, x):
return kernel.integrate_box_1d(0,max(x)) #Integrate pdf between two bounds (-inf to x here!)
def _pdf(self, x):
return kernel.evaluate(x) #Evaluate the estimated pdf on a provided set of points
return rv(name='kdedist')
test_data = np.random.random(100) # random test data
distribution_data = getDistribution1(test_data)
pdf_data = distribution_data.pdf(test_data) # the pdf of the data
In the above piece of code, there exists three methods,
rvsto generate random samples based on datacdfwhich is the integral of the pdf from 0 to max(data)pdfwhich is the pdf of the data
The reason I need this pdf is because now I am trying to calculate weights for my data based on probability. So that I can give each of my data point a probability which I can then use as my weights.
I would also like to know from here how I should proceed to calculate my weights?
P.S. Forgive me for asking the same question in cross validated, there seems to be no response!
回答1:
The online docs have a link to the source code, which for gaussian_kde is here: https://github.com/scipy/scipy/blob/v0.15.1/scipy/stats/kde.py#L193
来源:https://stackoverflow.com/questions/30186868/how-does-the-stats-gaussian-kde-method-calcute-the-pdf