Count consecutive ones in a dataframe and get indices where this occurs

I have a pandas.DataFrame with integer column names, which has zeroes and ones. An example of the input:

    12  13  14  15
1   0   0   1   0
2   0   0   1   1
3   1   0   0   1
4   1   1   0   1
5   1   1   1   0
6   0   0   1   0
7   0   0   1   1
8   1   1   0   1
9   0   0   1   1
10  0   0   1   1
11  1   1   0   1
12  1   1   1   1
13  1   1   1   1
14  1   0   1   1
15  0   0   1   1

I need to count all consecutive ones which has a length/sum which is >=2, iterating through columns and returning also indices where an array of the consecutive ones occurs (start, end).

The preferred output would be a 3D DataFrame, where subcolumns "count" and "indices" refer to integer column names from the input.

An example output would look like this one:

12              13              14              15
count   indices count   indices count   indices count   indices
    3     (3,5)     2     (4,5)     2     (1,2)     3     (2,4)
    4   (11,14)     3   (11,13)     3     (5,7)     9    (7,15)
                                    2    (9,10) 
                                    4   (12,15)

I suppose it should be solved with itertools.groupby, but still can't figure out how to apply it to such problem, where both groupby results and its indices are being extracted.

Here is one way to calculate the desired run lengths:

Code:

def min_run_length(series):
    terminal = pd.Series([0])
    diffs = pd.concat([terminal, series, terminal]).diff()
    starts = np.where(diffs == 1)
    ends = np.where(diffs == -1)
    return [(e-s, (s, e-1)) for s, e in zip(starts[0], ends[0])
            if e - s >= 2]

Test Code:

df = pd.read_fwf(StringIO(u"""
    12  13  14  15
    0   0   1   0
    0   0   1   1
    1   0   0   1
    1   1   0   1
    1   1   1   0
    0   0   1   0
    0   0   1   1
    1   1   0   1
    0   0   1   1
    0   0   1   1
    1   1   0   1
    1   1   1   1
    1   1   1   1
    1   0   1   1
    0   0   1   1"""), header=1)
print(df.dtypes)

indices = {cname: min_run_length(df[cname]) for cname in df.columns}
print(indices)

Results:

{
 u'12': [(3, (3, 5)), (4, (11, 14))], 
 u'13': [(2, (4, 5)), (3, (11, 13))], 
 u'14': [(2, (1, 2)), (3, (5, 7)), (2, (9, 10)), (4, (12, 15))]
 u'15': [(3, (2, 4)), (9, (7, 15))], 
}

来源：https://stackoverflow.com/questions/43986045/count-consecutive-ones-in-a-dataframe-and-get-indices-where-this-occurs

标签

python

pandas

itertools