Pandas: how to groupby based on series pattern

爱⌒轻易说出口 提交于 2019-12-11 02:32:12

问题


Having the following df:

pd.DataFrame({'bool':[True,True,True, False,True,True,True],
              'foo':[1,3,2,6,2,4,7]})

which results into:

    bool    foo
0   True    1
1   True    3
2   True    2
3   False   6
4   True    2
5   True    4
6   True    7

how to groupby Trues into 2 groups, to have indexes [0:2] in group 1, and [4:6] in group 2 ?

The desired output: group1:

    bool    foo
0   True    1
1   True    3
2   True    2

group2:

4   True    2
5   True    4
6   True    7

Thank you!


回答1:


you could do :

import numpy as np
x = df[df["bool"]].index.values
groups = np.split(x, np.where(np.diff(x)>1)[0]+1)
df_groups = [df.iloc[gr, :] for gr in groups]

The output looks like :


df_groups[0]
Out[56]: 
   bool  foo
0  True    1
1  True    3
2  True    2

df_groups[1]
Out[57]: 
   bool  foo
4  True    2
5  True    4
6  True    7




回答2:


Here is a simple way to do it :

# Split the dataframe by `Series` using `cumsum`
g =(~data['bool']).cumsum().where(data['bool'])

dfs= {'group_'+str(i+1):v for i, (k, v) in enumerate(data[['foo']].groupby(g))}

you can get access to each dataframe using the keys 'group_'+str(i+1) like group_1, group_2, ..etc:

print(dfs['group_1'])

   foo
0    1
1    3
2    2


来源:https://stackoverflow.com/questions/57132096/pandas-how-to-groupby-based-on-series-pattern

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!