Python Pandas: Is Order Preserved When Using groupby() and agg()?

前端 未结 5 2055
清歌不尽
清歌不尽 2020-12-14 00:21

I\'ve frequented used pandas\' agg() function to run summary statistics on every column of a data.frame. For example, here\'s how you would produce the mean an

相关标签:
5条回答
  • 2020-12-14 00:36

    Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

    The API accepts "SORT" as an argument.

    Description for SORT argument is like this:

    sort : bool, default True Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.

    Thus, it is clear the "Groupby" does preserve the order of rows within each group.

    0 讨论(0)
  • 2020-12-14 00:37

    In order to preserve order, you'll need to pass .groupby(..., sort=False). In your case the grouping column is already sorted, so it does not make difference, but generally one must use the sort=False flag:

     df.groupby('A', sort=False).agg([np.mean, lambda x: x.iloc[1] ])
    
    0 讨论(0)
  • 2020-12-14 00:42

    Even easier:

      import pandas as pd
      pd.pivot_table(df,index='A',aggfunc=(np.mean))
    

    output:

                B    C
         A                
       group1  11.0  101
       group2  17.5  175
       group3  11.0  101
    
    0 讨论(0)
  • 2020-12-14 00:49

    Panda's 0.19.1 doc says "groupby preserves the order of rows within each group", so this is guaranteed behavior.

    http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html

    0 讨论(0)
  • 2020-12-14 00:59

    See this enhancement issue

    The short answer is yes, the groupby will preserve the orderings as passed in. You can prove this by using your example like this:

    In [20]: df.sort_index(ascending=False).groupby('A').agg([np.mean, lambda x: x.iloc[1] ])
    Out[20]: 
               B             C         
            mean <lambda> mean <lambda>
    A                                  
    group1  11.0       10  101      100
    group2  17.5       10  175      100
    group3  11.0       10  101      100
    

    This is NOT true for resample however as it requires a monotonic index (it WILL work with a non-monotonic index, but will sort it first).

    Their is a sort= flag to groupby, but this relates to the sorting of the groups themselves and not the observations within a group.

    FYI: df.groupby('A').nth(1) is a safe way to get the 2nd value of a group (as your method above will fail if a group has < 2 elements)

    0 讨论(0)
提交回复
热议问题