I have a x,y
distribution of points for which I obtain the KDE
through scipy.stats.gaussian_kde. This is my code and how the output looks (the
A direct way is to integrate
import matplotlib.pyplot as plt
import sklearn
from scipy import integrate
import numpy as np
mean = [0, 0]
cov = [[5, 0], [0, 10]]
x, y = np.random.multivariate_normal(mean, cov, 5000).T
plt.plot(x, y, 'o')
plt.show()
sample = np.array(zip(x, y))
kde = sklearn.neighbors.KernelDensity().fit(sample)
def f_kde(x,y):
return np.exp((kde.score_samples([[x,y]])))
point = x1, y1
integrate.nquad(f_kde, [[-np.inf, x1],[-np.inf, y1]])
The problem is that, this is very slow if you do it in a large scale. For example, if you want to plot the x,y
line at x (0,100), it would take a long time to calculate.
Notice: I used kde
from sklearn
, but I believe you can also change it into other form as well.
Using the kernel
as defined in the original question:
import numpy as np
from scipy import stats
from scipy import integrate
def integ_func(kde, x1, y1):
def f_kde(x, y):
return kde((x, y))
integ = integrate.nquad(f_kde, [[-np.inf, x1], [-np.inf, y1]])
return integ
# Obtain data from file.
data = np.loadtxt('data.dat', unpack=True)
# Perform a kernel density estimate (KDE) on the data
kernel = stats.gaussian_kde(data)
# Define the number that will determine the integration limits
x1, y1 = 2.5, 1.5
print integ_func(kernel, x1, y1)