Pandas - dataframe groupby - how to get sum of multiple columns

前端 未结 5 608
别跟我提以往
别跟我提以往 2020-12-05 02:34

This should be an easy one, but somehow I couldn\'t find a solution that works.

I have a pandas dataframe which looks like this:

index col1   col2            


        
相关标签:
5条回答
  • 2020-12-05 03:01

    Another generic solution is

    df.groupby(['col1','col2']).agg({'col3':'sum','col4':'sum'}).reset_index()
    

    This will give you the required output.

    UPDATED (June 2020): Introduced in Pandas 0.25.0, Pandas has added new groupby behavior “named aggregation” and tuples, for naming the output columns when applying multiple aggregation functions to specific columns.

    df.groupby(
         ['col1','col2']
     ).agg(
         sum_col3 = ('col3','sum'),
         sum_col4     = ('col4','sum'),
     ).reset_index()
    

    Refer to Link for detailed description.

    0 讨论(0)
  • 2020-12-05 03:02

    The issue is likely that df.col3.dtype is likely not an int or a numeric datatype. Try df.col3 = df.col3.astype(int) before doing your groupby

    Additionally, select your columns after the groupby to see if the columns are even being aggregated:

    df_new = df.groupby(['col1', 'col2']).sum()[["col3", "col4"]]
    
    0 讨论(0)
  • 2020-12-05 03:06

    By using apply

    df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum())
    Out[1257]: 
               col3  col4
    col1 col2            
    a    c        2     4
         d        1     2
    b    d        1     2
         e        2     4
    

    If you want to agg

    df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'})
    
    0 讨论(0)
  • 2020-12-05 03:10

    I think it would be more efficient to do the following:

    df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'}).sum(axis=1)
    

    or:

    df.groupby(['col1', 'col2'])['col3', 'col4'].sum().sum(axis=1)
    

    This does assume you have appropriate types in the dataframe.

    0 讨论(0)
  • 2020-12-05 03:27

    The above answer didn't work for me.

    df_new = df.groupby(['col1', 'col2']).sum()[["col3", "col4"]]
    

    I was grouping by single group by and sum columns.

    Here is the one worked for me.

    D1.groupby(['col1'])['col2'].sum() << The sum at the end not the middle.
    
    0 讨论(0)
提交回复
热议问题