Using scipy.stats.gaussian_kde with 2 dimensional data

后端 未结 4 1985
独厮守ぢ
独厮守ぢ 2020-12-31 20:13

I\'m trying to use the scipy.stats.gaussian_kde class to smooth out some discrete data collected with latitude and longitude information, so it shows up as somewhat similar

4条回答
  •  余生分开走
    2020-12-31 20:33

    I found it difficult to understand the SciPy manual's description of how gaussian_kde works with 2D data. Here is an explanation which is intended to complement @endolith 's example. I divided the code into several steps with comments to explain the less intuitive bits.

    First, the imports:

    import numpy as np
    import scipy.stats as st
    from matplotlib.pyplot import imshow, show
    

    Create some dummy data: these are 1-D arrays of the "X" and "Y" point coordinates.

    np.random.seed(142)  # for reproducibility
    x = st.norm.rvs(loc=2, scale=1, size=2000)
    y = st.norm.rvs(loc=0, scale=3, size=2000)
    

    For 2-D density estimation the gaussian_kde object has to be initialised with an array with two rows containing the "X" and "Y" datasets. In NumPy terminology, we "stack them vertically":

    xy = np.vstack((x, y))
    

    so the "X" data is in the first row xy[0,:] and the "Y" data are in the second row xy[1,:] and xy.shape is (2, 2000). Now create the gaussian_kde object:

    dens = st.gaussian_kde(xy)
    

    We will evaluate the estimated 2-D density PDF on a 2-D grid. There is more than one way of creating such a grid in NumPy. I show here an approach which is different from (but functionally equivalent to) @endolith 's method:

    gx, gy = np.mgrid[x.min():x.max():128j, y.min():y.max():128j]
    gxy = np.dstack((gx, gy)) # shape is (128, 128, 2)
    

    gxy is a 3-D array, the [i,j]-th element of gxy contains a 2-element list of the corresponding "X" and "Y" values: gxy[i, j] 's value is [ gx[i], gy[j] ].

    We have to invoke dens() (or dens.pdf() which is the same thing) on each of the 2-D grid points. NumPy has a very elegant function for this purpose:

    z = np.apply_along_axis(dens, 2, gxy)
    

    In words, the callable dens (could have been dens.pdf as well) is invoked along axis=2 (the third axis) in the 3-D array gxy and the values should be returned as a 2-D array. The only glitch is that the shape of z will be (128,128,1) and not (128,128) what I expected. Note that the documentation says that:

    The shape of out [the return value, L.D.] is identical to the shape of arr, except along the axis dimension. This axis is removed, and replaced with new dimensions equal to the shape of the return value of func1d. So if func1d returns a scalar out will have one fewer dimensions than arr.

    Most likely dens() returned a 1-long tuple and not a scalar which I was hoping for. I didn't investigate the issue any further, because this is easy to fix:

    z = z.reshape(128, 128)
    

    after which we can generate the image:

    imshow(z, aspect=gx.ptp() / gy.ptp())
    show()  # needed if you try this in PyCharm
    

    Here is the image. (Note that I have implemented @endolith 's version as well and got an image indistinguishable from this one.)

提交回复
热议问题