In Pandas, after groupby the grouped column is gone

前端 未结 4 762
萌比男神i
萌比男神i 2021-01-04 09:48

I have the following dataframe named ttm:

    usersidid   clienthostid    eventSumTotal   LoginDaysSum    score
0       12          1               60                


        
4条回答
  •  死守一世寂寞
    2021-01-04 10:39

    count is a built in method for the groupby object and pandas knows what to do with it. There are two other things specified that goes into determining what the out put looks like.

    #                         For a built in method, when
    #                         you don't want the group column
    #                         as the index, pandas keeps it in
    #                         as a column.
    #                             |----||||----|
    ttm.groupby(['clienthostid'], as_index=False, sort=False)['LoginDaysSum'].count()
    
       clienthostid  LoginDaysSum
    0             1             4
    1             3             2
    

    #                         For a built in method, when
    #                         you do want the group column
    #                         as the index, then...
    #                             |----||||---|
    ttm.groupby(['clienthostid'], as_index=True, sort=False)['LoginDaysSum'].count()
    #                                                       |-----||||-----|
    #                                                 the single brackets tells
    #                                                 pandas to operate on a series
    #                                                 in this case, count the series
    
    clienthostid
    1    4
    3    2
    Name: LoginDaysSum, dtype: int64
    

    ttm.groupby(['clienthostid'], as_index=True, sort=False)[['LoginDaysSum']].count()
    #                                                       |------||||------|
    #                                             the double brackets tells pandas
    #                                                to operate on the dataframe
    #                                              specified by these columns and will
    #                                                return a dataframe
    
                  LoginDaysSum
    clienthostid              
    1                        4
    3                        2
    

    When you used apply pandas no longer knows what to do with the group column when you say as_index=False. It has to trust that if you use apply you want returned exactly what you say to return, so it will just throw it away. Also, you have single brackets around your column which says to operate on a series. Instead, use as_index=True to keep the grouping column information in the index. Then follow it up with a reset_index to transfer it from the index back into the dataframe. At this point, it will not have mattered that you used single brackets because after the reset_index you'll have a dataframe again.

    ttm.groupby(['clienthostid'], as_index=True, sort=False)['LoginDaysSum'].apply(lambda x: x.iloc[0] / x.iloc[1])
    
    0    1.0
    1    1.5
    dtype: float64
    

    ttm.groupby(['clienthostid'], as_index=True, sort=False)['LoginDaysSum'].apply(lambda x: x.iloc[0] / x.iloc[1]).reset_index()
    
       clienthostid  LoginDaysSum
    0             1           1.0
    1             3           1.5
    

提交回复
热议问题