Pandas sum by groupby, but exclude certain columns

前端 未结 3 1981
孤城傲影
孤城傲影 2020-11-27 11:02

What is the best way to do a groupby on a Pandas dataframe, but exclude some columns from that groupby? e.g. I have the following dataframe:



        
相关标签:
3条回答
  • 2020-11-27 11:56

    You can select the columns of a groupby:

    In [11]: df.groupby(['Country', 'Item_Code'])[["Y1961", "Y1962", "Y1963"]].sum()
    Out[11]:
                           Y1961  Y1962  Y1963
    Country     Item_Code
    Afghanistan 15            10     20     30
                25            10     20     30
    Angola      15            30     40     50
                25            30     40     50
    

    Note that the list passed must be a subset of the columns otherwise you'll see a KeyError.

    0 讨论(0)
  • 2020-11-27 12:05

    The agg function will do this for you. Pass the columns and function as a dict with column, output:

    df.groupby(['Country', 'Item_Code']).agg({'Y1961': np.sum, 'Y1962': [np.sum, np.mean]})  # Added example for two output columns from a single input column
    

    This will display only the group by columns, and the specified aggregate columns. In this example I included two agg functions applied to 'Y1962'.

    To get exactly what you hoped to see, included the other columns in the group by, and apply sums to the Y variables in the frame:

    df.groupby(['Code', 'Country', 'Item_Code', 'Item', 'Ele_Code', 'Unit']).agg({'Y1961': np.sum, 'Y1962': np.sum, 'Y1963': np.sum})
    
    0 讨论(0)
  • 2020-11-27 12:09

    If you are looking for a more generalized way to apply to many columns, what you can do is to build a list of column names and pass it as the index of the grouped dataframe. In your case, for example:

    columns = ['Y'+str(i) for year in range(1967, 2011)]
    
    df.groupby('Country')[columns].agg('sum')
    
    0 讨论(0)
提交回复
热议问题