Apply multiple functions to multiple groupby columns

前端 未结 7 2229
春和景丽
春和景丽 2020-11-22 03:16

The docs show how to apply multiple functions on a groupby object at a time using a dict with the output column names as the keys:

In [563]: grouped[\'D\'].a         


        
7条回答
  •  忘掉有多难
    2020-11-22 04:19

    New in version 0.25.0.

    To support column-specific aggregation with control over the output column names, pandas accepts the special syntax in GroupBy.agg(), known as “named aggregation”, where

    • The keywords are the output column names
    • The values are tuples whose first element is the column to select and the second element is the aggregation to apply to that column. Pandas provides the pandas.NamedAgg namedtuple with the fields ['column', 'aggfunc'] to make it clearer what the arguments are. As usual, the aggregation can be a callable or a string alias.
        In [79]: animals = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'],
           ....:                         'height': [9.1, 6.0, 9.5, 34.0],
           ....:                         'weight': [7.9, 7.5, 9.9, 198.0]})
           ....: 
    
        In [80]: animals
        Out[80]: 
          kind  height  weight
        0  cat     9.1     7.9
        1  dog     6.0     7.5
        2  cat     9.5     9.9
        3  dog    34.0   198.0
    
        In [81]: animals.groupby("kind").agg(
           ....:     min_height=pd.NamedAgg(column='height', aggfunc='min'),
           ....:     max_height=pd.NamedAgg(column='height', aggfunc='max'),
           ....:     average_weight=pd.NamedAgg(column='weight', aggfunc=np.mean),
           ....: )
           ....: 
        Out[81]: 
              min_height  max_height  average_weight
        kind                                        
        cat          9.1         9.5            8.90
        dog          6.0        34.0          102.75
    

    pandas.NamedAgg is just a namedtuple. Plain tuples are allowed as well.

        In [82]: animals.groupby("kind").agg(
           ....:     min_height=('height', 'min'),
           ....:     max_height=('height', 'max'),
           ....:     average_weight=('weight', np.mean),
           ....: )
           ....: 
        Out[82]: 
              min_height  max_height  average_weight
        kind                                        
        cat          9.1         9.5            8.90
        dog          6.0        34.0          102.75
    

    Additional keyword arguments are not passed through to the aggregation functions. Only pairs of (column, aggfunc) should be passed as **kwargs. If your aggregation functions requires additional arguments, partially apply them with functools.partial().

    Named aggregation is also valid for Series groupby aggregations. In this case there’s no column selection, so the values are just the functions.

        In [84]: animals.groupby("kind").height.agg(
           ....:     min_height='min',
           ....:     max_height='max',
           ....: )
           ....: 
        Out[84]: 
              min_height  max_height
        kind                        
        cat          9.1         9.5
        dog          6.0        34.0
    

提交回复
热议问题