Custom pandas groupby on a list of intervals

倖福魔咒の 提交于 2019-12-10 17:41:58

问题


I have a dataframe df:

     A    B
0   28  abc
1   29  def
2   30  hij
3   31  hij
4   32  abc
5   28  abc
6   28  abc
7   29  def
8   30  hij
9   28  abc
10  29  klm
11  30  nop
12  28  abc
13  29  xyz

df.dtypes

A    object        # A is a string column as well
B    object
dtype: object

I want to use the values from this list to groupby:

i = np.array([ 3,  5,  6,  9, 12, 14])

Basically, all rows in df with index 0, 1, 2 are in the first group, rows with index 3, 4 are in the second group, rows with index 5 are in the third group, and so on.

My end goal is this:

A              B
28,29,30       abc,def,hij
31,32          hij,abc
28             abc
28,29,30       abc,def,hij
28,29,30       abc,klm,nop
28,29          abc,xyz

Solution so far using groupby + pd.cut:

df.groupby(pd.cut(df.index, bins=np.append([0], i)), as_index=False).agg(','.join)

          A            B
0  29,30,31  def,hij,hij
1     32,28      abc,abc
2        28          abc
3  29,30,28  def,hij,abc
4  29,30,28  klm,nop,abc
5        29          xyz

The result is incorrect :-(

How can I do this properly?


回答1:


You are very close, but use include_lowest=True and right=False in pd.cut because you want 0th index from the bins and then you don't want to include last element each of the bins i.e

idx = pd.cut(df.index, bins=np.append([0], i), 
                      include_lowest=True, right=False)
df.groupby(idx, as_index=False).agg(','.join)
A              B
28,29,30       abc,def,hij
31,32          hij,abc
28             abc
28,29,30       abc,def,hij
28,29,30       abc,klm,nop
28,29          abc,xyz



回答2:


I think this could be fast ..

df['G']=0
np.put(df.G,i-1,[1]*len(i))
df.groupby(df.G.iloc[::-1].cumsum())[['A','B']].agg(lambda x: ','.join(x.astype(str))).sort_index(ascending =False)
Out[772]: 
          A            B
G                       
6  28,29,30  abc,def,hij
5     31,32      hij,abc
4        28          abc
3  28,29,30  abc,def,hij
2  28,29,30  abc,klm,nop
1     28,29      abc,xyz


来源:https://stackoverflow.com/questions/47304847/custom-pandas-groupby-on-a-list-of-intervals

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!