问题
I have a data frame of 3 columns with numerical values, first two columns are a set with two elements. I want to treat the rows of these 2 columns as a set (that contains the same elements) and group by + sum:
df.groupby([A,B]).sum() --- won't work here
example:
A B counter
750 1334 10
1080 1920 15
1080 1920 10
1920 1080 10
1125 2436 20
result :
A B counter
750 1334 10
1080 1920 35
1125 2436 20
回答1:
Idea is sorting both columns by numpy.sort and reassign back:
df[['A','B']] = np.sort(df[['A','B']], axis=1)
df = df.groupby(['A','B'], as_index=False)['counter'].sum()
print (df)
A B counter
0 750 1334 10
1 1080 1920 35
2 1125 2436 20
Or assign to array passed to groupby
:
arr = np.sort(df[['A','B']], axis=1)
df = df.groupby([arr[:, 0],arr[:, 1]])['counter'].sum().rename_axis(('A','B')).reset_index()
print (df)
A B counter
0 750 1334 10
1 1080 1920 35
2 1125 2436 20
来源:https://stackoverflow.com/questions/56650372/group-by-and-sum-over-rows-with-same-contents