Efficiently get indices of histogram bins in Python

前端 未结 5 1216
清歌不尽
清歌不尽 2020-12-13 19:12

Short Question

I have a large 10000x10000 elements image, which I bin into a few hundred different sectors/bins. I then need to perform some iterative calculation

5条回答
  •  盖世英雄少女心
    2020-12-13 20:16

    I assume that the binning, done in the example with digitize, cannot be changed. This is one way to go, where you do the sorting once and for all.

    vals = np.random.random(1e4)
    nbins = 100
    bins = np.linspace(0, 1, nbins+1)
    ind = np.digitize(vals, bins)
    
    new_order = argsort(ind)
    ind = ind[new_order]
    ordered_vals = vals[new_order]
    # slower way of calculating first_hit (first version of this post)
    # _,first_hit = unique(ind,return_index=True)
    # faster way:
    first_hit = searchsorted(ind,arange(1,nbins-1))
    first_hit.sort()
    
    #example of using the data:
    for j in range(nbins-1):
        #I am using a plotting function for your f, to show that they cluster
        plot(ordered_vals[first_hit[j]:first_hit[j+1]],'o')
    

    The figure shows that the bins are actually clusters as expected: enter image description here

提交回复
热议问题