I have the following dataframe df:
data={\'id\':[1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2],
\'value\':[2,2,3,2,2,2,3,3,3,3,1,4,1,1,1,4,4,1,1,1,1,1]}
You can try this; 1) Create an extra group variable with df.value.diff().ne(0).cumsum()
to denote the value changes; 2) use transform('size')
to calculate the group size and compare with three, then you get the flag
column you need:
df['flag'] = df.value.groupby([df.id, df.value.diff().ne(0).cumsum()]).transform('size').ge(3).astype(int)
df
Break downs:
1) diff
is not equal to zero (which is literally what df.value.diff().ne(0)
means) gives a condition True
whenever there is a value change:
df.value.diff().ne(0)
#0 True
#1 False
#2 True
#3 True
#4 False
#5 False
#6 True
#7 False
#8 False
#9 False
#10 True
#11 True
#12 True
#13 False
#14 False
#15 True
#16 False
#17 True
#18 False
#19 False
#20 False
#21 False
#Name: value, dtype: bool
2) Then cumsum
gives a non descending sequence of ids where each id denotes a consecutive chunk with same values, note when summing boolean values, True
is considered as one while False
is considered as zero:
df.value.diff().ne(0).cumsum()
#0 1
#1 1
#2 2
#3 3
#4 3
#5 3
#6 4
#7 4
#8 4
#9 4
#10 5
#11 6
#12 7
#13 7
#14 7
#15 8
#16 8
#17 9
#18 9
#19 9
#20 9
#21 9
#Name: value, dtype: int64
3) combined with id
column, you can group the data frame, calculate the group size and get the flag
column.