How to GroupBy a Dataframe in Pandas and keep Columns

前端 未结 3 2110
甜味超标
甜味超标 2020-12-07 14:09

given a dataframe that logs uses of some books like this:

Name   Type   ID
Book1  ebook  1
Book2  paper  2
Book3  paper  3
Book1  ebook  1
Book2  paper  2


        
相关标签:
3条回答
  • 2020-12-07 14:43

    I think as_index=False should do the trick.

    df.groupby(['Name','Type','ID'], as_index=False).count()
    
    0 讨论(0)
  • 2020-12-07 14:46

    You want the following:

    In [20]:
    df.groupby(['Name','Type','ID']).count().reset_index()
    
    Out[20]:
        Name   Type  ID  Count
    0  Book1  ebook   1      2
    1  Book2  paper   2      2
    2  Book3  paper   3      1
    

    In your case the 'Name', 'Type' and 'ID' cols match in values so we can groupby on these, call count and then reset_index.

    An alternative approach would be to add the 'Count' column using transform and then call drop_duplicates:

    In [25]:
    df['Count'] = df.groupby(['Name'])['ID'].transform('count')
    df.drop_duplicates()
    
    Out[25]:
        Name   Type  ID  Count
    0  Book1  ebook   1      2
    1  Book2  paper   2      2
    2  Book3  paper   3      1
    
    0 讨论(0)
  • 2020-12-07 15:02

    If you have many columns in a df it makes sense to use df.groupby(['foo']).agg(...), see here. The .agg() function allows you to choose what to do with the columns you don't want to apply operations on. If you just want to keep them, use .agg({'col1': 'first', 'col2': 'first', ...}. Instead of 'first', you can also apply 'sum', 'mean' and others.

    0 讨论(0)
提交回复
热议问题