Integrate 2D kernel density estimate

前端 未结 3 1469
梦如初夏
梦如初夏 2020-12-06 02:50

I have a x,y distribution of points for which I obtain the KDE through scipy.stats.gaussian_kde. This is my code and how the output looks (the

相关标签:
3条回答
  • 2020-12-06 03:41

    Here is a way to do it using monte carlo integration. It is a little slow, and there is randomness in the solution. The error is inversely proportional to the square root of the sample size, while the running time is directly proportional to the sample size (where sample size refers to the monte carlo sample (10000 in my example below), not the size of your data set). Here is some simple code using your kernel object.

    #Compute the point below which to integrate
    iso = kernel((x1,y1))
    
    #Sample from your KDE distribution
    sample = kernel.resample(size=10000)
    
    #Filter the sample
    insample = kernel(sample) < iso
    
    #The integral you want is equivalent to the probability of drawing a point 
    #that gets through the filter
    integral = insample.sum() / float(insample.shape[0])
    print integral
    

    I get approximately 0.2 as the answer for your data set.

    0 讨论(0)
  • 2020-12-06 03:43

    Currently, it is available

    kernel.integrate_box([-np.inf,-np.inf], [2.5,1.5])

    0 讨论(0)
  • 2020-12-06 03:44

    A direct way is to integrate

    import matplotlib.pyplot as plt
    import sklearn
    from scipy import integrate
    import numpy as np
    
    mean = [0, 0]
    cov = [[5, 0], [0, 10]]
    x, y = np.random.multivariate_normal(mean, cov, 5000).T
    plt.plot(x, y, 'o')
    plt.show()
    
    sample = np.array(zip(x, y))
    kde = sklearn.neighbors.KernelDensity().fit(sample)
    def f_kde(x,y):
        return np.exp((kde.score_samples([[x,y]])))
    
    point = x1, y1
    integrate.nquad(f_kde, [[-np.inf, x1],[-np.inf, y1]])
    

    The problem is that, this is very slow if you do it in a large scale. For example, if you want to plot the x,y line at x (0,100), it would take a long time to calculate.

    Notice: I used kde from sklearn, but I believe you can also change it into other form as well.


    Using the kernel as defined in the original question:

    import numpy as np
    from scipy import stats
    from scipy import integrate
    
    def integ_func(kde, x1, y1):
    
        def f_kde(x, y):
            return kde((x, y))
    
        integ = integrate.nquad(f_kde, [[-np.inf, x1], [-np.inf, y1]])
    
        return integ
    
    # Obtain data from file.
    data = np.loadtxt('data.dat', unpack=True)
    # Perform a kernel density estimate (KDE) on the data
    kernel = stats.gaussian_kde(data)
    
    # Define the number that will determine the integration limits
    x1, y1 = 2.5, 1.5
    print integ_func(kernel, x1, y1)
    
    0 讨论(0)
提交回复
热议问题