When is it appropriate to use df.value_counts() vs df.groupby('…').count()?

前端 未结 2 616
情书的邮戳
情书的邮戳 2020-11-29 04:34

I\'ve heard in Pandas there\'s often multiple ways to do the same thing, but I was wondering –

If I\'m trying to group data by a value within a specific column and

2条回答
  •  我在风中等你
    2020-11-29 05:13

    There is difference value_counts return:

    The resulting object will be in descending order so that the first element is the most frequently-occurring element.

    but count not, it sort output by index (created by column in groupby('col')).


    df.groupby('colA').count() 
    

    is for aggregate all columns of df by function count. So it count values excluding NaNs.

    So if need count only one column need:

    df.groupby('colA')['colA'].count() 
    

    Sample:

    df = pd.DataFrame({'colB':list('abcdefg'),
                       'colC':[1,3,5,7,np.nan,np.nan,4],
                       'colD':[np.nan,3,6,9,2,4,np.nan],
                       'colA':['c','c','b','a',np.nan,'b','b']})
    
    print (df)
      colA colB  colC  colD
    0    c    a   1.0   NaN
    1    c    b   3.0   3.0
    2    b    c   5.0   6.0
    3    a    d   7.0   9.0
    4  NaN    e   NaN   2.0
    5    b    f   NaN   4.0
    6    b    g   4.0   NaN
    
    print (df['colA'].value_counts())
    b    3
    c    2
    a    1
    Name: colA, dtype: int64
    
    print (df.groupby('colA').count())
          colB  colC  colD
    colA                  
    a        1     1     1
    b        3     2     2
    c        2     2     1
    
    print (df.groupby('colA')['colA'].count())
    colA
    a    1
    b    3
    c    2
    Name: colA, dtype: int64
    

提交回复
热议问题