Groupby of multiple columns and assigning values to each by considering start and end of each (Pandas)

岁酱吖の 提交于 2020-05-17 07:04:41

问题


I've got a datframe that looks like that

df1
    v   w   x   y                               
4   0   1   a   b
5   0   1   a   a
_________________
6   0   2   a   b
_________________
2   0   3   a   b 
- - - - - - - - -   
3   1   2   a   b
_________________
15  1   3   a   b
12  1   3   b   b
_________________
13  1   1   a   b
- - - - - - - - - 
15  3   1   a   b
14  3   1   b   a
8   3   1   a   b
9   3   1   a   a

so df1 were grouped (lines) by v and w and merged with another df which contained x and y. I need a new column z which picks the right group out of x and y with the following conditions:

  1. in Every subgroup 'V'(dotted line) the first group should be of 'x' (x always starts with 'a' within groups, y always starts with 'b')
  2. based on the end letter of each group (a or b) the next group should start with b (column'y') or a (column 'x')
  3. if both groups end with the same letter, pick next group out of 'x'

Should Look like this:

df1
    v   w   x   y   z                            
4   0   1   a   b   a
5   0   1   a   a   a
_____________________
6   0   2   a   b   b
_____________________
2   0   3   a   b   a
- - - - - - - - - -- -   
3   1   2   a   b   a
_____________________
15  1   3   a   b   b
12  1   3   b   b   b
_____________________
13  1   1   a   b   a
 - - - - - - - - - - 
15  3   1   a   b   a
14  3   1   b   a   b
8   3   1   a   b   a
9   3   1   a   a   a

so basically last letter of a group and first letter of next group within subgroups of 'v' should be different. Is that understandable and could anyone help me?


回答1:


IIUC

df=df.reset_index(drop=True)
s=pd.DataFrame(np.sort(df[['x','y']],axis=1),index=df.index)[1].iloc[::-1].ne('b').cumsum()
df.groupby([df.v,df.w,s]).ngroup()
0     0
1     0
2     1
3     2
4     4
5     5
6     5
7     3
8     6
9     6
10    6
11    6
dtype: int64


来源:https://stackoverflow.com/questions/61521190/groupby-of-multiple-columns-and-assigning-values-to-each-by-considering-start-an

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!