Integrate 2D kernel density estimate

前端未结

关注

 3  1508

I have a x,y distribution of points for which I obtain the KDE through scipy.stats.gaussian_kde. This is my code and how the output looks (the

相关标签:

3条回答

死守一世寂寞

2020-12-06 03:41
Here is a way to do it using monte carlo integration. It is a little slow, and there is randomness in the solution. The error is inversely proportional to the square root of the sample size, while the running time is directly proportional to the sample size (where sample size refers to the monte carlo sample (10000 in my example below), not the size of your data set). Here is some simple code using your kernel object.
```
#Compute the point below which to integrate
iso = kernel((x1,y1))

#Sample from your KDE distribution
sample = kernel.resample(size=10000)

#Filter the sample
insample = kernel(sample) < iso

#The integral you want is equivalent to the probability of drawing a point 
#that gets through the filter
integral = insample.sum() / float(insample.shape[0])
print integral
```
I get approximately 0.2 as the answer for your data set.
0 讨论(0)
发布评论:

提交评论
- 加载中...
广开言路

2020-12-06 03:43

Currently, it is available

kernel.integrate_box([-np.inf,-np.inf], [2.5,1.5])

0 讨论(0)
发布评论:

提交评论
- 加载中...

醉梦人生

2020-12-06 03:44

A direct way is to integrate

import matplotlib.pyplot as plt
import sklearn
from scipy import integrate
import numpy as np

mean = [0, 0]
cov = [[5, 0], [0, 10]]
x, y = np.random.multivariate_normal(mean, cov, 5000).T
plt.plot(x, y, 'o')
plt.show()

sample = np.array(zip(x, y))
kde = sklearn.neighbors.KernelDensity().fit(sample)
def f_kde(x,y):
    return np.exp((kde.score_samples([[x,y]])))

point = x1, y1
integrate.nquad(f_kde, [[-np.inf, x1],[-np.inf, y1]])

The problem is that, this is very slow if you do it in a large scale. For example, if you want to plot the x,y line at x (0,100), it would take a long time to calculate.

Notice: I used kde from sklearn, but I believe you can also change it into other form as well.

Using the kernel as defined in the original question:

import numpy as np
from scipy import stats
from scipy import integrate

def integ_func(kde, x1, y1):

    def f_kde(x, y):
        return kde((x, y))

    integ = integrate.nquad(f_kde, [[-np.inf, x1], [-np.inf, y1]])

    return integ

# Obtain data from file.
data = np.loadtxt('data.dat', unpack=True)
# Perform a kernel density estimate (KDE) on the data
kernel = stats.gaussian_kde(data)

# Define the number that will determine the integration limits
x1, y1 = 2.5, 1.5
print integ_func(kernel, x1, y1)

0 讨论(0)