Scatter plot with a huge amount of data

前端 未结 3 2183
后悔当初
后悔当初 2020-12-14 09:10

I would like to use Matplotlib to generate a scatter plot with a huge amount of data (about 3 million points). Actually I\'ve 3 vectors with the same dimension and I use to

3条回答
  •  误落风尘
    2020-12-14 09:41

    Unless your graphic is huge, many of those 3 million points are going to overlap. (A 400x600 image only has 240K dots...)

    So the easiest thing to do would be to take a sample of say, 1000 points, from your data:

    import random
    delta_sample=random.sample(delta,1000)
    

    and just plot that.

    For example:

    import matplotlib.pyplot as plt
    import matplotlib.cm as cm
    import numpy as np
    import random
    
    fig = plt.figure()
    fig.subplots_adjust(bottom=0.2)
    ax = fig.add_subplot(111)
    
    N=3*10**6
    delta=np.random.normal(size=N)
    vf=np.random.normal(size=N)
    dS=np.random.normal(size=N)
    
    idx=random.sample(range(N),1000)
    
    plt.scatter(delta[idx],vf[idx],c=dS[idx],alpha=0.7,cmap=cm.Paired)
    plt.show()
    

    alt text

    Or, if you need to pay more attention to outliers, then perhaps you could bin your data using np.histogram, and then compose a delta_sample which has representatives from each bin.

    Unfortunately, when using np.histogram I don't think there is any easy way to associate bins with individual data points. A simple, but approximate solution is to use the location of a point in or on the bin edge itself as a proxy for the points in it:

    xedges=np.linspace(-10,10,100)
    yedges=np.linspace(-10,10,100)
    zedges=np.linspace(-10,10,10)
    hist,edges=np.histogramdd((delta,vf,dS), (xedges,yedges,zedges))
    xidx,yidx,zidx=np.where(hist>0)
    plt.scatter(xedges[xidx],yedges[yidx],c=zedges[zidx],alpha=0.7,cmap=cm.Paired)
    plt.show()
    

    alt text

提交回复
热议问题