问题
Similar to below
Buyer Seller Amount
John Mary 3
Mary John 2
David Bosco 2
Where I want to sum John and Mary rows into one
Expected out come
Trade1 Trade2 Amount
John Mary 5
David Bosco 2
My dataframe has around 6000 rows. Thank you for your help
回答1:
First sort values by numpy.sort and create boolean mask by DataFrame.duplicated and then aggregate sum:
df[['Buyer','Seller']] = pd.DataFrame(np.sort(df[['Buyer','Seller']], axis=1))
df2 = df.groupby(['Buyer','Seller'], as_index=False)['Amount'].sum()
df2.columns = ['Trade1','Trade2','Amount']
print (df2)
Trade1 Trade2 Amount
0 Bosco David 2
1 John Mary 5
If dont want modify original columns use syntactic sugar - groupby with Series:
df1 = pd.DataFrame(np.sort(df[['Buyer','Seller']], axis=1))
df1.columns = ['Trade1','Trade2']
df2 = df['Amount'].groupby([df1['Trade1'],df1['Trade2']]).sum().reset_index()
print (df2)
Trade1 Trade2 Amount
0 Bosco David 2
1 John Mary 5
来源:https://stackoverflow.com/questions/51874962/sum-two-rows-if-two-cells-are-the-same-but-in-different-order