Efficiently create a density plot for high-density regions, points for sparse regions

后端 未结 4 1285
栀梦
栀梦 2020-12-13 22:01

I need to make a plot that functions like a density plot for high-density regions on the plot, but below some threshold uses individual points. I couldn\'t find any existing

相关标签:
4条回答
  • 2020-12-13 22:27

    This should do it:

    import matplotlib.pyplot as plt, numpy as np, numpy.random, scipy
    
    #histogram definition
    xyrange = [[-5,5],[-5,5]] # data range
    bins = [100,100] # number of bins
    thresh = 3  #density threshold
    
    #data definition
    N = 1e5;
    xdat, ydat = np.random.normal(size=N), np.random.normal(1, 0.6, size=N)
    
    # histogram the data
    hh, locx, locy = scipy.histogram2d(xdat, ydat, range=xyrange, bins=bins)
    posx = np.digitize(xdat, locx)
    posy = np.digitize(ydat, locy)
    
    #select points within the histogram
    ind = (posx > 0) & (posx <= bins[0]) & (posy > 0) & (posy <= bins[1])
    hhsub = hh[posx[ind] - 1, posy[ind] - 1] # values of the histogram where the points are
    xdat1 = xdat[ind][hhsub < thresh] # low density points
    ydat1 = ydat[ind][hhsub < thresh]
    hh[hh < thresh] = np.nan # fill the areas with low density by NaNs
    
    plt.imshow(np.flipud(hh.T),cmap='jet',extent=np.array(xyrange).flatten(), interpolation='none', origin='upper')
    plt.colorbar()   
    plt.plot(xdat1, ydat1, '.',color='darkblue')
    plt.show()
    

    image

    0 讨论(0)
  • 2020-12-13 22:37

    After a night to sleep on it and reading through Oz123's suggestions, I figured it out. The trick is to compute which bin each x,y point falls into (xi,yi), then test if H[xi,yi] (actually, in my case H[yi,xi]) is beneath the threshold. The code is below, and runs very fast for large numbers of points and is much cleaner:

    import numpy as np
    import math
    import matplotlib as mpl
    import matplotlib.pyplot as plt
    import pylab
    import numpy.random
    
    #Create the colormap:
    halfpurples = {'blue': [(0.0,1.0,1.0),(0.000001, 0.78431373834609985, 0.78431373834609985),
    0.25, 0.729411780834198, 0.729411780834198), (0.5,
    0.63921570777893066, 0.63921570777893066), (0.75,
    0.56078433990478516, 0.56078433990478516), (1.0, 0.49019607901573181,
    0.49019607901573181)],
    
        'green': [(0.0,1.0,1.0),(0.000001,
        0.60392159223556519, 0.60392159223556519), (0.25,
        0.49019607901573181, 0.49019607901573181), (0.5,
        0.31764706969261169, 0.31764706969261169), (0.75,
        0.15294118225574493, 0.15294118225574493), (1.0, 0.0, 0.0)],
    
        'red': [(0.0,1.0,1.0),(0.000001,
        0.61960786581039429, 0.61960786581039429), (0.25,
        0.50196081399917603, 0.50196081399917603), (0.5,
        0.41568627953529358, 0.41568627953529358), (0.75,
        0.32941177487373352, 0.32941177487373352), (1.0,
        0.24705882370471954, 0.24705882370471954)]} 
    
    halfpurplecmap = mpl.colors.LinearSegmentedColormap('halfpurples',halfpurples,256)
    
    #Create x,y arrays of normally distributed points
    npts = 100000
    x = numpy.random.standard_normal(npts)
    y = numpy.random.standard_normal(npts)
    
    #Set bin numbers in both axes
    nxbins = 100
    nybins = 100
    
    #Set the cutoff for resolving the individual points
    minperbin = 1
    
    #Make the density histrogram
    H, yedges, xedges = np.histogram2d(y,x,bins=(nybins,nxbins))
    #Reorient the axes
    H =  H[::-1]
    
    extent = [xedges[0],xedges[-1],yedges[0],yedges[-1]]
    
    #Figure out which bin each x,y point is in
    xbinsize = xedges[1]-xedges[0]
    ybinsize = yedges[1]-yedges[0]
    xi = ((x-xedges[0])/xbinsize).astype(np.integer)
    yi = nybins-1-((y-yedges[0])/ybinsize).astype(np.integer)
    
    #Subtract one from any points exactly on the right and upper edges of the region
    xim1 = xi-1
    yim1 = yi-1
    xi = np.where(xi < nxbins,xi,xim1)
    yi = np.where(yi < nybins,yi,yim1)
    
    #Get all points with density below the threshold
    lowdensityx = x[H[yi,xi] <= minperbin]
    lowdensityy = y[H[yi,xi] <= minperbin]
    
    #Plot
    fig1 = plt.figure()
    ax1 = fig1.add_subplot(111)
    ax1.plot(lowdensityx,lowdensityy,linestyle='.',marker='o',mfc='k',mec='k',ms=3)
    cp1 = ax1.imshow(H,interpolation='nearest',extent=extent,cmap=halfpurplecmap,vmin=minperbin)
    fig1.colorbar(cp1)
    
    fig1.savefig('contourtest.eps')
    
    0 讨论(0)
  • 2020-12-13 22:39

    For the record, here is the result of a new attempt using scipy.stats.gaussian_kde rather than a 2D histogram. One could envision different combinations of color meshing and contouring depending on the purpose.

    import numpy as np
    from matplotlib import pyplot as plt
    from scipy.stats import gaussian_kde
    
    # parameters
    npts = 5000         # number of sample points
    bins = 100          # number of bins in density maps
    threshold = 0.01    # density threshold for scatter plot
    
    # initialize figure
    fig, ax = plt.subplots()
    
    # create a random dataset
    x1, y1 = np.random.multivariate_normal([0, 0], [[1, 0], [0, 1]], npts/2).T
    x2, y2 = np.random.multivariate_normal([4, 4], [[4, 0], [0, 1]], npts/2).T
    x = np.hstack((x1, x2))
    y = np.hstack((y1, y2))
    points = np.vstack([x, y])
    
    # perform kernel density estimate
    kde = gaussian_kde(points)
    z = kde(points)
    
    # mask points above density threshold
    x = np.ma.masked_where(z > threshold, x)
    y = np.ma.masked_where(z > threshold, y)
    
    # plot unmasked points
    ax.scatter(x, y, c='black', marker='.')
    
    # get bounds from axes
    xmin, xmax = ax.get_xlim()
    ymin, ymax = ax.get_ylim()
    
    # prepare grid for density map
    xedges = np.linspace(xmin, xmax, bins)
    yedges = np.linspace(ymin, ymax, bins)
    xx, yy = np.meshgrid(xedges, yedges)
    gridpoints = np.array([xx.ravel(), yy.ravel()])
    
    # compute density map
    zz = np.reshape(kde(gridpoints), xx.shape)
    
    # plot density map
    im = ax.imshow(zz, cmap='CMRmap_r', interpolation='nearest',
                   origin='lower', extent=[xmin, xmax, ymin, ymax])
    
    # plot threshold contour
    cs = ax.contour(xx, yy, zz, levels=[threshold], colors='black')
    
    # show
    fig.colorbar(im)
    plt.show()
    

    Smooth scatter plot

    0 讨论(0)
  • 2020-12-13 22:47

    Your problem is quadratic - for npts = 1000, you have array size reaching 10^6 points, and than you iterate over these lists with list comprehensions.
    Now, this is a matter of taste of course, but I find that list comprehension can yield a totally code which is hard to follow, and they are only slightly faster sometimes ... but that's not my point.
    My point is that for large array operations you have numpy functions like:

    np.where, np.choose etc.
    

    See that you can achieve that functionality of the list comprehensions with NumPy, and your code should run faster.

    Do I understand correctly, your comment ?

    #Find all points that lie in these regions
    

    are you testing for a point inside a polygon ? if so, consider point in polygon inside matplotlib.

    0 讨论(0)
提交回复
热议问题