问题
Given this dataframe:
C
index
0 9
1 0
2 1
3 5
4 0
5 1
6 2
7 20
8 0
How can I split this into groups such that
- Group 1 has
[9, 0]
, - Group 2 has
[1, 5, 0]
, - Group 3 has
[1, 2, 20, 0]
?
The idea is to find all sequences that terminate with 0 and group them together. The sequences can vary in size and and the last sequence may not terminate with 0. The first element will never be 0.
My end result looks something like this:
C_new
9
6
23
Where I find these groups and then sum them.
回答1:
Use groupby by Series
:
print (df['C'].shift(1).eq(0).cumsum())
0 0
1 0
2 1
3 1
4 1
5 2
6 2
7 2
8 2
Name: C, dtype: int32
df = df['C'].groupby(df['C'].shift(1).eq(0).cumsum()).sum()
print (df)
C
0 9
1 6
2 23
Name: C, dtype: int64
来源:https://stackoverflow.com/questions/45959750/custom-groupby-based-on-column-values