Python Pandas: Is Order Preserved When Using groupby() and agg()?

前端未结

关注

 5  2065

I\'ve frequented used pandas\' agg() function to run summary statistics on every column of a data.frame. For example, here\'s how you would produce the mean an

相关标签:

5条回答

无人共我

2020-12-14 00:36

Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

The API accepts "SORT" as an argument.

Description for SORT argument is like this:

sort : bool, default True Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.

Thus, it is clear the "Groupby" does preserve the order of rows within each group.

0 讨论(0)
发布评论:

提交评论
- 加载中...
-上瘾入骨i

2020-12-14 00:37
In order to preserve order, you'll need to pass .groupby(..., sort=False). In your case the grouping column is already sorted, so it does not make difference, but generally one must use the sort=False flag:
```
 df.groupby('A', sort=False).agg([np.mean, lambda x: x.iloc[1] ])
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

独厮守ぢ

2020-12-14 00:42

Even easier:

  import pandas as pd
  pd.pivot_table(df,index='A',aggfunc=(np.mean))

output:

            B    C
     A                
   group1  11.0  101
   group2  17.5  175
   group3  11.0  101

0 讨论(0)

渐次进展

2020-12-14 00:49

Panda's 0.19.1 doc says "groupby preserves the order of rows within each group", so this is guaranteed behavior.

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html

0 讨论(0)
发布评论:

提交评论
- 加载中...
渐次进展

2020-12-14 00:59
See this enhancement issue

The short answer is yes, the groupby will preserve the orderings as passed in. You can prove this by using your example like this:
```
In [20]: df.sort_index(ascending=False).groupby('A').agg([np.mean, lambda x: x.iloc[1] ])
Out[20]: 
           B             C         
        mean <lambda> mean <lambda>
A                                  
group1  11.0       10  101      100
group2  17.5       10  175      100
group3  11.0       10  101      100
```
This is NOT true for resample however as it requires a monotonic index (it WILL work with a non-monotonic index, but will sort it first).

Their is a sort= flag to groupby, but this relates to the sorting of the groups themselves and not the observations within a group.

FYI: df.groupby('A').nth(1) is a safe way to get the 2nd value of a group (as your method above will fail if a group has < 2 elements)
0 讨论(0)
发布评论:

提交评论
- 加载中...