What is the difference between pandas agg and apply function?

前端 未结 4 1202
闹比i
闹比i 2020-11-29 03:55

I can\'t figure out the difference between Pandas .aggregate and .apply functions.
Take the following as an example: I load a dataset, do a

4条回答
  •  误落风尘
    2020-11-29 04:30

    (Note: These comparisons are relevant for DataframeGroupby objects)

    Some plausible advantages of using .agg() compared to .apply(), for DataFrame GroupBy objects would be:

    1. .agg() gives the flexibility of applying multiple functions at once, or pass a list of function to each column.

    2. Also, applying different functions at once to different columns of dataframe.

    That means you have pretty much control over each column with each operation.

    Here is the link for more details: http://pandas.pydata.org/pandas-docs/version/0.13.1/groupby.html


    However, the apply function could be limited to apply one function to each column of the dataframe at a time. So, you might have to call the apply function repeatedly to call upon different operations to the same column.

    Here are some example comparisons for .apply() vs .agg() for DataframeGroupBy objects :

    Given the following dataframe:

    In [261]: df = pd.DataFrame({"name":["Foo", "Baar", "Foo", "Baar"], "score_1":[5,10,15,10], "score_2" :[10,15,10,25], "score_3" : [10,20,30,40]})
    
    In [262]: df
    Out[262]: 
       name  score_1  score_2  score_3
    0   Foo        5       10       10
    1  Baar       10       15       20
    2   Foo       15       10       30
    3  Baar       10       25       40
    

    Lets first see the operations using .apply():

    In [263]: df.groupby(["name", "score_1"])["score_2"].apply(lambda x : x.sum())
    Out[263]: 
    name  score_1
    Baar  10         40
    Foo   5          10
          15         10
    Name: score_2, dtype: int64
    
    In [264]: df.groupby(["name", "score_1"])["score_2"].apply(lambda x : x.min())
    Out[264]: 
    name  score_1
    Baar  10         15
    Foo   5          10
          15         10
    Name: score_2, dtype: int64
    
    In [265]: df.groupby(["name", "score_1"])["score_2"].apply(lambda x : x.mean())
    Out[265]: 
    name  score_1
    Baar  10         20.0
    Foo   5          10.0
          15         10.0
    Name: score_2, dtype: float64
    

    Now, look at the same operations using .agg( ) effortlessly:

    In [276]: df.groupby(["name", "score_1"]).agg({"score_3" :[np.sum, np.min, np.mean, np.max], "score_2":lambda x : x.mean()})
    Out[276]: 
                  score_2 score_3               
                      sum amin mean amax
    name score_1                                
    Baar 10            20      60   20   30   40
    Foo  5             10      10   10   10   10
         15            10      30   30   30   30
    

    So, .agg() could be really handy at handling the DataFrameGroupBy objects, as compared to .apply(). But, if you are handling only pure dataframe objects and not DataFrameGroupBy objects, then apply() can be very useful, as apply() can apply a function along any axis of the dataframe.

    (For Eg: axis = 0 implies column-wise operation with .apply(), which is a default mode, and axis = 1 would imply for row-wise operation while dealing with pure dataframe objects).

提交回复
热议问题