What is the difference between pandas agg and apply function?

我怕爱的太早我们不能终老 提交于 2019-11-27 04:04:31
TomAugspurger

apply applies the function to each group (your Species). Your function returns 1, so you end up with 1 value for each of 3 groups.

agg aggregates each column (feature) for each group, so you end up with one value per column per group.

Do read the groupby docs, they're quite helpful. There are also a bunch of tutorials floating around the web.

( Note: These comparison are relevant for DataframeGroupby objects )

Some plausible advantages of using .agg() compared to .apply(), for DataFrame GroupBy objects would be:

1) .agg() gives the flexibility of applying multiple functions at once, or pass a list of function to each column.

2) Also, applying different functions at once to different columns of dataframe.

That means you have pretty much control over each column with each operations.

Here is the link for more details: http://pandas.pydata.org/pandas-docs/version/0.13.1/groupby.html

However, apply function could be limited to apply one function to each columns of the dataframe at a time. So, you might have to call the apply function repeatedly to call upon different operations to same column.

Here, are some example comparison for .apply() vs .agg() for DataframeGroupBy objects :

Lets, first, see the operations using .apply( ):

In [261]: df = pd.DataFrame({"name":["Foo", "Baar", "Foo", "Baar"], "score_1":[5,10,15,10], "score_2" :[10,15,10,25], "score_3" : [10,20,30,40]})

In [262]: df
Out[262]: 
   name  score_1  score_2  score_3
0   Foo        5       10       10
1  Baar       10       15       20
2   Foo       15       10       30
3  Baar       10       25       40

In [263]: df.groupby(["name", "score_1"])["score_2"].apply(lambda x : x.sum())
Out[263]: 
name  score_1
Baar  10         40
Foo   5          10
      15         10
Name: score_2, dtype: int64

In [264]: df.groupby(["name", "score_1"])["score_2"].apply(lambda x : x.min())
Out[264]: 
name  score_1
Baar  10         15
Foo   5          10
      15         10
Name: score_2, dtype: int64

In [265]: df.groupby(["name", "score_1"])["score_2"].apply(lambda x : x.mean())
Out[265]: 
name  score_1
Baar  10         20.0
Foo   5          10.0
      15         10.0
Name: score_2, dtype: float64

Now, look at the same operations using .agg( ) effortlessly:

In [274]: df = pd.DataFrame({"name":["Foo", "Baar", "Foo", "Baar"], "score_1":[5,10,15,10], "score_2" :[10,15,10,25], "score_3" : [10,20,30,40]})

In [275]: df
Out[275]: 
   name  score_1  score_2  score_3
0   Foo        5       10       10
1  Baar       10       15       20
2   Foo       15       10       30
3  Baar       10       25       40

In [276]: df.groupby(["name", "score_1"]).agg({"score_3" :[np.sum, np.min, np.mean, np.max], "score_2":lambda x : x.mean()})
Out[276]: 
              score_2 score_3               
             <lambda>     sum amin mean amax
name score_1                                
Baar 10            20      60   20   30   40
Foo  5             10      10   10   10   10
     15            10      30   30   30   30

So, .agg( ) could be really handy at handling the DataFrameGroupBy objects, as compared to .apply( ). But, if you are handling only pure dataframe objects, and not DataFrameGroupBy objects then apply() can be very useful, as apply( ) can apply a function along any axis of the dataframe.

(For Eg: axis = 0 implies column-wise operation with .apply(), which is a default mode, and axis = 1 would imply for row-wise operation while dealing with pure dataframe objects )

When using apply to a groupby I have encountered that .apply will return the grouped columns. There is a note in the documentation (pandas.pydata.org/pandas-docs/stable/groupby.html):

"...Thus the grouped columns(s) may be included in the output as well as set the indices."

.aggregate will not return the grouped columns.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!