How can I count the number of consecutive TRUEs in a DataFrame?

后端 未结 2 1192
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-11 14:58

I have a dataset made of True and False.

Sample Table:
       A      B      C
0  False   True  False
1  False  False  False
2   True   True  False
3   True           


        
2条回答
  •  天命终不由人
    2021-01-11 15:43

    We would basically leverage two philosophies - Catching shifts on compared array and Offsetting each column results so that we could vectorize it.

    So, with that intention set, here's one way to achieve the desired results -

    def maxisland_start_len_mask(a, fillna_index = -1, fillna_len = 0):
        # a is a boolean array
    
        pad = np.zeros(a.shape[1],dtype=bool)
        mask = np.vstack((pad, a, pad))
    
        mask_step = mask[1:] != mask[:-1]
        idx = np.flatnonzero(mask_step.T)
        island_starts = idx[::2]
        island_lens = idx[1::2] - idx[::2]
        n_islands_percol = mask_step.sum(0)//2
    
        bins = np.repeat(np.arange(a.shape[1]),n_islands_percol)
        scale = island_lens.max()+1
    
        scaled_idx = np.argsort(scale*bins + island_lens)
        grp_shift_idx = np.r_[0,n_islands_percol.cumsum()]
        max_island_starts = island_starts[scaled_idx[grp_shift_idx[1:]-1]]
    
        max_island_percol_start = max_island_starts%(a.shape[0]+1)
    
        valid = n_islands_percol!=0
        cut_idx = grp_shift_idx[:-1][valid]
        max_island_percol_len = np.maximum.reduceat(island_lens, cut_idx)
    
        out_len = np.full(a.shape[1], fillna_len, dtype=int)
        out_len[valid] = max_island_percol_len
        out_index = np.where(valid,max_island_percol_start,fillna_index)
        return out_index, out_len
    

    Sample run -

    # Generic case to handle all 0s columns
    In [112]: a
    Out[112]: 
    array([[False, False, False],
           [False, False, False],
           [ True, False, False],
           [ True, False,  True],
           [False, False, False],
           [ True, False,  True],
           [ True, False, False],
           [ True, False,  True],
           [False, False,  True],
           [ True, False, False]])
    
    In [117]: starts,lens = maxisland_start_len_mask(a, fillna_index=-1, fillna_len=0)
    
    In [118]: starts
    Out[118]: array([ 5, -1,  7])
    
    In [119]: lens
    Out[119]: array([3, 0, 2])
    

提交回复
热议问题