python pandas error when doing groupby counts

前端 未结 1 826
情书的邮戳
情书的邮戳 2020-12-16 02:53

When doing groupby counts over multiple columns I get an error. Here is my dataframe and also an example that simply labels the distinct \'b\' and \'c\' groups.



        
相关标签:
1条回答
  • 2020-12-16 03:19

    Evaluate df.groupby(['b', 'c']).count() in an interactive session:

    In [150]: df.groupby(['b', 'c']).count()
    Out[150]: 
         a  b  c  d
    b c            
    0 0  1  1  1  1
      1  1  1  1  1
    1 1  2  2  2  2
    

    This is a whole DataFrame. It is probably not what you want to assign to a new column of df (in fact, you can not assign a column to a DataFrame, which is why an albeit cryptic exception is raised.).


    If you wish to create a new column which counts the number of rows in each group, you could use

    df['gr'] = df.groupby(['b', 'c'])['a'].transform('count')
    

    For example,

    import pandas as pd
    import numpy as np
    np.random.seed(1)
    df = pd.DataFrame(np.random.randint(0, 2, (4, 4)),
                      columns=['a', 'b', 'c', 'd'])
    print(df)
    #    a  b  c  d
    # 0  1  1  0  0
    # 1  1  1  1  1
    # 2  1  0  0  1
    # 3  0  1  1  0
    
    df['gr'] = df.groupby(['b', 'c'])['a'].transform('count')
    
    df['comp_ids'] = df.groupby(['b', 'c']).grouper.group_info[0]
    print(df)
    

    yields

       a  b  c  d  gr  comp_ids
    0  1  1  0  0   1         1
    1  1  1  1  1   2         2
    2  1  0  0  1   1         0
    3  0  1  1  0   2         2
    

    Notice that df.groupby(['b', 'c']).grouper.group_info[0] is returning something other than the counts of the number of rows in each group. Rather, it is returning a label for each group.

    0 讨论(0)
提交回复
热议问题