问题
I have a dataframe like this:
Name id col1 col2 col3 cl4
PL 252 0 747 3 53
PL2 252 1 24 2 35
PL3 252 4 75 24 13
AD 889 53 24 0 95
AD2 889 23 2 0 13
AD3 889 0 24 3 6
BG 024 12 89 53 66
BG1 024 43 16 13 0
BG2 024 5 32 101 4
And now I need to group by ID, and for columns col1 and col4 find the sum for each id and put that into a new column near to parent column (example: col3(sum)) But for col2 and col3 find max value. Desired output:
Name id col1 col1(sum) col2 col2(max) col3 col(max) col4 col4(sum)
PL 252 0 5 747 747 3 24 6 18
PL2 252 1 5 24 747 2 24 12 18
PL3 252 4 5 75 747 24 24 0 18
AD 889 53 76 24 24 95 95 23 33
AD2 889 23 76 2 24 13 95 5 33
AD3 889 0 76 24 24 6 95 5 33
BG 024 12 60 89 89 66 66 0 67
BG1 024 43 60 16 89 0 66 63 67
BG2 024 5 60 32 89 4 66 4 67
What is the easiest and fastest way to calculate this?
回答1:
You can use groupby/transform to creat the required columns
df[['col1_sum', 'col4_sum']]=df.groupby('id')['col1', 'cl4'].transform('sum')
df[['col2_max', 'col3_max']]=df.groupby('id')['col1', 'cl4'].transform('max')
Name id col1 col2 col3 cl4 col1_sum col4_sum col2_max col3_max
0 PL 252 0 747 3 53 5 101 4 53
1 PL2 252 1 24 2 35 5 101 4 53
2 PL3 252 4 75 24 13 5 101 4 53
3 AD 889 53 24 0 95 76 114 53 95
4 AD2 889 23 2 0 13 76 114 53 95
5 AD3 889 0 24 3 6 76 114 53 95
6 BG 24 12 89 53 66 60 70 43 66
7 BG1 24 43 16 13 0 60 70 43 66
8 BG2 24 5 32 101 4 60 70 43 66
回答2:
The most (pandas) native way to do this, is to use the .agg() method that allows you to specify the aggregation function you want to apply per column (just like you would do in SQL).
Sample from the documentation:
df.groupby('A').agg({'B': ['min', 'max'], 'C': 'sum'})
回答3:
You can use merge when you have groupby and sum on id :
pd.merge(df,df.groupby("id").sum().reset_index(), on='id',how='outer')
output
回答4:
I know this is messy but I like chaining so you can do something like this:
df = df.groupby('id').
apply(lambda g: g.assign(
col1_sum=g.col1.sum(),
col2_max=g.col2.max()))
Basically, this is applying a group based assign command to each group and then combining into a single DataFrame.
See https://pandas.pydata.org/pandas-docs/stable/api.html for details on each method.
来源:https://stackoverflow.com/questions/44724480/group-by-one-columns-and-find-sum-and-max-value-for-another-in-pandas