问题
I've a Pandas dataframe, and some numerical data about some people. What I need to do is to find people that appare more than one time in the dataframe, and to substitute all the row about one people with one row where the numeric values are the sum of the numeric values of the rows before in some columns, and the minimum of this values in other. I know how to do the sum using groupby() and sum() but not how to do different thing for the different columns
Example:
Names Column1 Column2 Column3
John 1 2 2016
Bob 2 3 2011
Pier 1 1 2003
John 3 3 2005
Bob 1 0 2018
Have to become:
Names Column1 Column2 Column3
John 4 5 2005
Bob 3 3 2011
Pier 1 1 2003
How can I do?
回答1:
Use groupby
+ agg
and define specific aggregation functions for each column as a dict
like:
df.groupby('Names').agg({'Column1':'sum', 'Column2':'sum','Column3':'min'})
Column1 Column2 Column3
Names
Bob 3 3 2011
John 3 3 2005
Jonh 1 2 2016
Pier 1 1 2003
来源:https://stackoverflow.com/questions/53133174/pandas-duplicates-groupby