How to calculate probabilities using numpy.histogram and then use it for calculating KL divergence?

女生的网名这么多〃 提交于 2019-12-12 23:55:05

问题


In the following code, the density=True returns probability density function at each bin. Now if have to calculate P(x), can I say that hist is showing probabilities? For example if the first bin's mean value is 0.5 can I say that at x=0.5 probability is hist[0] ? I have to use KL divergence which uses P(x).

x = np.array([0,0,0,0,0,3,3,2,2,2,1,1,1,1,])
hist,bin_edges= np.histogram(x,bins=10,density=True)

回答1:


When you set density=True, NumPy returns a probability density function (lets say p). Theoretically speaking, p(0.5) = 0 because the probability is defined as the area under the PDF curve. You can read more details about it here. So, if you want to the compute probability you will have to define desired range and sum up all PDF values in this range.

For the KL, I can share my solution for the mutual information computation (which is basically KL):

def mutual_information(x, y, sigma=1):
    bins = (256, 256)
    # histogram
    hist_xy = np.histogram2d(x, y, bins=bins)[0]

    # smooth it out for better results
    ndimage.gaussian_filter(hist_xy, sigma=sigma, mode='constant', output=hist_xy)

    # compute marginals
    hist_xy = hist_xy + EPS # prevent division with 0
    hist_xy = hist_xy / np.sum(hist_xy)
    hist_x = np.sum(hist_xy, axis=0)
    hist_y = np.sum(hist_xy, axis=1)

    # compute mi
    mi = (np.sum(hist_xy * np.log(hist_xy)) - np.sum(hist_x * np.log(hist_x)) - np.sum(hist_y * np.log(hist_y)))
    return mi

EDIT: KL could be computed like this (please note that i did not test this!):

def kl(x, y, sigma=1):
    # histogram
    hist_xy = np.histogram2d(x, y, bins=bins)[0]

    # smooth it out for better results
    ndimage.gaussian_filter(hist_xy, sigma=sigma, mode='constant', output=hist_xy)

    # compute marginals
    hist_xy = hist_xy + EPS # prevent division with 0
    hist_xy = hist_xy / np.sum(hist_xy)
    hist_x = np.sum(hist_xy, axis=0)
    hist_y = np.sum(hist_xy, axis=1)

    kl = -np.sum(hist_x * np.log(hist_y / hist_x ))
    return kl

Also, for the best result, you should compute sigma with some heuristics, for example A rule-of-thumb bandwidth estimator.



来源:https://stackoverflow.com/questions/57687458/how-to-calculate-probabilities-using-numpy-histogram-and-then-use-it-for-calcula

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!