Efficiently get indices of histogram bins in Python

前端 未结 5 1198
清歌不尽
清歌不尽 2020-12-13 19:12

Short Question

I have a large 10000x10000 elements image, which I bin into a few hundred different sectors/bins. I then need to perform some iterative calculation

5条回答
  •  独厮守ぢ
    2020-12-13 20:15

    I found that a particular sparse matrix constructor can achieve the desired result very efficiently. It's a bit obscure but we can abuse it for this purpose. The function below can be used in nearly the same way as scipy.stats.binned_statistic but can be orders of magnitude faster

    import numpy as np
    from scipy.sparse import csr_matrix
    
    def binned_statistic(x, values, func, nbins, range):
        '''The usage is nearly the same as scipy.stats.binned_statistic''' 
    
        N = len(values)
        r0, r1 = range
    
        digitized = (float(nbins)/(r1 - r0)*(x - r0)).astype(int)
        S = csr_matrix((values, [digitized, np.arange(N)]), shape=(nbins, N))
    
        return [func(group) for group in np.split(S.data, S.indptr[1:-1])]
    

    I avoided np.digitize because it doesn't use the fact that all bins are equal width and hence is slow, but the method I used instead may not handle all edge cases perfectly.

提交回复
热议问题