Count consecutive ones in a dataframe and get indices where this occurs

孤者浪人 提交于 2019-12-06 01:44:30

Here is one way to calculate the desired run lengths:

Code:

def min_run_length(series):
    terminal = pd.Series([0])
    diffs = pd.concat([terminal, series, terminal]).diff()
    starts = np.where(diffs == 1)
    ends = np.where(diffs == -1)
    return [(e-s, (s, e-1)) for s, e in zip(starts[0], ends[0])
            if e - s >= 2]

Test Code:

df = pd.read_fwf(StringIO(u"""
    12  13  14  15
    0   0   1   0
    0   0   1   1
    1   0   0   1
    1   1   0   1
    1   1   1   0
    0   0   1   0
    0   0   1   1
    1   1   0   1
    0   0   1   1
    0   0   1   1
    1   1   0   1
    1   1   1   1
    1   1   1   1
    1   0   1   1
    0   0   1   1"""), header=1)
print(df.dtypes)

indices = {cname: min_run_length(df[cname]) for cname in df.columns}
print(indices)

Results:

{
 u'12': [(3, (3, 5)), (4, (11, 14))], 
 u'13': [(2, (4, 5)), (3, (11, 13))], 
 u'14': [(2, (1, 2)), (3, (5, 7)), (2, (9, 10)), (4, (12, 15))]
 u'15': [(3, (2, 4)), (9, (7, 15))], 
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!