Fast, python-ish way of ranking chunks of 1's in numpy array?

后端 未结 2 1794
忘了有多久
忘了有多久 2021-01-05 03:29

I have a numpy array consisting of 0\'s and 1\'s. Each sequence of 1\'s within the array stands for occurrence of one event. I want to

2条回答
  •  佛祖请我去吃肉
    2021-01-05 03:55

    You want to label and luckily, there's one with SciPy, scipy.ndimage.label -

    In [43]: from scipy.ndimage import label
    
    In [47]: out = label(arr)[0]
    
    In [48]: np.where(arr==0,np.nan,out-1)
    Out[48]: 
    array([nan, nan, nan,  0.,  0.,  0., nan, nan, nan,  1.,  1., nan, nan,
           nan,  2.,  2.,  2.,  2.])
    

    Another with some NumPy work -

    def rank_chunks(arr):
        m = np.r_[False,arr.astype(bool)]
        idx = np.flatnonzero(m[:-1] < m[1:])
        id_ar = np.zeros(len(arr),dtype=float)
        id_ar[idx[1:]] = 1
        out = id_ar.cumsum()
        out[arr==0] = np.nan
        return out
    

    Another with masking + np.repeat -

    def rank_chunks_v2(arr):
        m = np.r_[False,arr.astype(bool),False]
        idx = np.flatnonzero(m[:-1] != m[1:])
        l = idx[1::2]-idx[::2]
        out = np.full(len(arr),np.nan,dtype=float)
        out[arr!=0] = np.repeat(np.arange(len(l)),l)
        return out
    

    Timings (tiling given input to 1Mx) -

    In [153]: arr_big = np.tile(arr,1000000)
    
    In [154]: %timeit np.where(arr_big==0,np.nan,label(arr_big)[0]-1)
         ...: %timeit rank_chunks(arr_big)
         ...: %timeit rank_chunks_v2(arr_big)
    1 loop, best of 3: 312 ms per loop
    1 loop, best of 3: 263 ms per loop
    1 loop, best of 3: 229 ms per loop
    

提交回复
热议问题