问题
I've got a datframe that looks like that
df1
v w x y
4 0 1 a b
5 0 1 a a
_________________
6 0 2 a b
_________________
2 0 3 a b
- - - - - - - - -
3 1 2 a b
_________________
15 1 3 a b
12 1 3 b b
_________________
13 1 1 a b
- - - - - - - - -
15 3 1 a b
14 3 1 b a
8 3 1 a b
9 3 1 a a
so df1 were grouped (lines) by v and w and merged with another df which contained x and y. I need a new column z which picks the right group out of x and y with the following conditions:
- in Every subgroup 'V'(dotted line) the first group should be of 'x' (x always starts with 'a' within groups, y always starts with 'b')
- based on the end letter of each group (a or b) the next group should start with b (column'y') or a (column 'x')
- if both groups end with the same letter, pick next group out of 'x'
Should Look like this:
df1
v w x y z
4 0 1 a b a
5 0 1 a a a
_____________________
6 0 2 a b b
_____________________
2 0 3 a b a
- - - - - - - - - -- -
3 1 2 a b a
_____________________
15 1 3 a b b
12 1 3 b b b
_____________________
13 1 1 a b a
- - - - - - - - - -
15 3 1 a b a
14 3 1 b a b
8 3 1 a b a
9 3 1 a a a
so basically last letter of a group and first letter of next group within subgroups of 'v' should be different. Is that understandable and could anyone help me?
回答1:
IIUC
df=df.reset_index(drop=True)
s=pd.DataFrame(np.sort(df[['x','y']],axis=1),index=df.index)[1].iloc[::-1].ne('b').cumsum()
df.groupby([df.v,df.w,s]).ngroup()
0 0
1 0
2 1
3 2
4 4
5 5
6 5
7 3
8 6
9 6
10 6
11 6
dtype: int64
来源:https://stackoverflow.com/questions/61521190/groupby-of-multiple-columns-and-assigning-values-to-each-by-considering-start-an