Why does this Kernel Density Estimation have values over 1.0?

*爱你&永不变心* 提交于 2020-01-06 04:53:08

问题


I'm trying to analyse the features of the Pima Indians Diabetes Data Set (follow the link to get the dataset) by plotting their probability density distributions. I haven't yet removed invalid 0 data, so the plots sometimes show a bias at the very left. For the most part, the distributions look accurate:

I have a problem with the look of the plot for DiabetesPedigree, which shows probabilities over 1.0 (for x ~ between 0.1 and 0.5). As I understand it, the combined probabilities should equal 1.0.

I've isolated the code for the DiatebesPedigree plot but the same will work for the others by changing the dataset_index value:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde

dataset_index = 6
feature_name = "DiabetesPedigree"
filename = 'pima-indians-diabetes.data.csv'

data = pd.read_csv(filename)
feature_data = data.ix[:, dataset_index]

graph_min = feature_data.min()
graph_max = feature_data.max()

density = gaussian_kde(feature_data)
density.covariance_factor = lambda : .25
density._compute_covariance()

xs = np.arange(graph_min, graph_max, (graph_max - graph_min)/200)
ys = density(xs)

plt.xlim(graph_min, graph_max)
plt.title(feature_name)
plt.plot(xs,ys)

plt.show()

回答1:


As rightly marked , a continous pdf never says the value to be less than 1, with the pdf for continous random variable, function p(x) is not the probability. you can refer for continuous random varibales and their distrubutions



来源:https://stackoverflow.com/questions/46441481/why-does-this-kernel-density-estimation-have-values-over-1-0

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!