get first and last values in a groupby

前端 未结 3 626
野性不改
野性不改 2020-12-02 23:34

I have a dataframe df

df = pd.DataFrame(np.arange(20).reshape(10, -1),
                  [[\'a\', \'a\', \'a\', \'a\', \'b\', \'b\', \'b\', \'c\         


        
3条回答
  •  日久生厌
    2020-12-03 00:04

    Option 1

    def first_last(df):
        return df.ix[[0, -1]]
    
    df.groupby(level=0, group_keys=False).apply(first_last)
    


    Option 2 - only works if index is unique

    idx = df.index.to_series().groupby(level=0).agg(['first', 'last']).stack()
    df.loc[idx]
    

    Option 3 - per notes below, this only makes sense when there are no NAs

    I also abused the agg function. The code below works, but is far uglier.

    df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \
        .set_index('level_1', append=True).reset_index(1, drop=True) \
        .rename_axis([None, None])
    

    Note

    per @unutbu: agg(['first', 'last']) take the firs non-na values.

    I interpreted this as, it must then be necessary to run this column by column. Further, forcing index level=1 to align may not even make sense.

    Let's include another test

    df = pd.DataFrame(np.arange(20).reshape(10, -1),
                      [list('aaaabbbccd'),
                       list('abcdefghij')],
                      list('XY'))
    
    df.loc[tuple('aa'), 'X'] = np.nan
    

    def first_last(df):
        return df.ix[[0, -1]]
    
    df.groupby(level=0, group_keys=False).apply(first_last)
    

    df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \
        .set_index('level_1', append=True).reset_index(1, drop=True) \
        .rename_axis([None, None])
    

    Sure enough! This second solution is taking the first valid value in column X. It is now nonsensical to have forced that value to align with the index a.

提交回复
热议问题