Using scipy.stats.gaussian_kde with 2 dimensional data

本小妞迷上赌 提交于 2019-11-30 09:30:36

This example seems to be what you're looking for:

import numpy as np
import scipy.stats as stats
from matplotlib.pyplot import imshow

# Create some dummy data
rvs = np.append(stats.norm.rvs(loc=2,scale=1,size=(2000,1)),
                stats.norm.rvs(loc=0,scale=3,size=(2000,1)),
                axis=1)

kde = stats.kde.gaussian_kde(rvs.T)

# Regular grid to evaluate kde upon
x_flat = np.r_[rvs[:,0].min():rvs[:,0].max():128j]
y_flat = np.r_[rvs[:,1].min():rvs[:,1].max():128j]
x,y = np.meshgrid(x_flat,y_flat)
grid_coords = np.append(x.reshape(-1,1),y.reshape(-1,1),axis=1)

z = kde(grid_coords.T)
z = z.reshape(128,128)

imshow(z,aspect=x_flat.ptp()/y_flat.ptp())

Axes need fixing, obviously.

You can also do a scatter plot of the data with

scatter(rvs[:,0],rvs[:,1])

I think you are mixing up kernel density estimation with interpolation or maybe kernel regression. KDE estimates the distribution of points if you have a larger sample of points.

I'm not sure which interpolation you want, but either the splines or rbf in scipy.interpolate will be more appropriate.

If you want one-dimensional kernel regression, then you can find a version in scikits.statsmodels with several different kernels.

update: here is an example (if this is what you want)

>>> data = 2 + 2*np.random.randn(2, 100)
>>> kde = stats.gaussian_kde(data)
>>> kde.evaluate(np.array([[1,2,3],[1,2,3]]))
array([ 0.02573917,  0.02470436,  0.03084282])

gaussian_kde has variables in rows and observations in columns, so reversed orientation from the usual in stats. In your example, all three points are on a line, so it has perfect correlation. That is, I guess, the reason for the singular matrix.

Adjusting the array orientation and adding a small noise, the example works, but still looks very concentrated, for example you don't have any sample point near (3,3):

>>> data = np.array([[1.1, 1.1],
              [1.2, 1.2],
              [1.3, 1.3]]).T
>>> data = data + 0.01*np.random.randn(2,3)
>>> kde = stats.gaussian_kde(data)
>>> kde.evaluate(np.array([[1,2,3],[1,2,3]]))
array([  7.70204299e+000,   1.96813149e-044,   1.45796523e-251])

The example posted in the top answer didn't work for me. I had to tweak it little bit and it works now:

import numpy as np
import scipy.stats as stats
from matplotlib import pyplot as plt

# Create some dummy data
rvs = np.append(stats.norm.rvs(loc=2,scale=1,size=(2000,1)),
                stats.norm.rvs(loc=0,scale=3,size=(2000,1)),
                axis=1)

kde = stats.kde.gaussian_kde(rvs.T)

# Regular grid to evaluate kde upon
x_flat = np.r_[rvs[:,0].min():rvs[:,0].max():128j]
y_flat = np.r_[rvs[:,1].min():rvs[:,1].max():128j]
x,y = np.meshgrid(x_flat,y_flat)
grid_coords = np.append(x.reshape(-1,1),y.reshape(-1,1),axis=1)

z = kde(grid_coords.T)
z = z.reshape(128,128)

plt.imshow(z,aspect=x_flat.ptp()/y_flat.ptp())
plt.show()
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!