问题
Having the following df:
pd.DataFrame({'bool':[True,True,True, False,True,True,True],
'foo':[1,3,2,6,2,4,7]})
which results into:
bool foo
0 True 1
1 True 3
2 True 2
3 False 6
4 True 2
5 True 4
6 True 7
how to groupby
Trues into 2 groups, to have indexes [0:2]
in group 1
, and [4:6]
in group 2
?
The desired output: group1:
bool foo
0 True 1
1 True 3
2 True 2
group2:
4 True 2
5 True 4
6 True 7
Thank you!
回答1:
you could do :
import numpy as np
x = df[df["bool"]].index.values
groups = np.split(x, np.where(np.diff(x)>1)[0]+1)
df_groups = [df.iloc[gr, :] for gr in groups]
The output looks like :
df_groups[0]
Out[56]:
bool foo
0 True 1
1 True 3
2 True 2
df_groups[1]
Out[57]:
bool foo
4 True 2
5 True 4
6 True 7
回答2:
Here is a simple way to do it :
# Split the dataframe by `Series` using `cumsum`
g =(~data['bool']).cumsum().where(data['bool'])
dfs= {'group_'+str(i+1):v for i, (k, v) in enumerate(data[['foo']].groupby(g))}
you can get access to each dataframe using the keys 'group_'+str(i+1)
like group_1
, group_2
, ..etc:
print(dfs['group_1'])
foo
0 1
1 3
2 2
来源:https://stackoverflow.com/questions/57132096/pandas-how-to-groupby-based-on-series-pattern