When is it appropriate to use df.value_counts() vs df.groupby('…').count()?

前端未结

关注

 2  616

情书的邮戳 2020-11-29 04:34

I\'ve heard in Pandas there\'s often multiple ways to do the same thing, but I was wondering –

If I\'m trying to group data by a value within a specific column and

2条回答

我在风中等你 (楼主)

2020-11-29 05:13

There is difference value_counts return:

The resulting object will be in descending order so that the first element is the most frequently-occurring element.

but count not, it sort output by index (created by column in groupby('col')).

df.groupby('colA').count()

is for aggregate all columns of df by function count. So it count values excluding NaNs.

So if need count only one column need:

df.groupby('colA')['colA'].count()

Sample:

df = pd.DataFrame({'colB':list('abcdefg'),
                   'colC':[1,3,5,7,np.nan,np.nan,4],
                   'colD':[np.nan,3,6,9,2,4,np.nan],
                   'colA':['c','c','b','a',np.nan,'b','b']})

print (df)
  colA colB  colC  colD
0    c    a   1.0   NaN
1    c    b   3.0   3.0
2    b    c   5.0   6.0
3    a    d   7.0   9.0
4  NaN    e   NaN   2.0
5    b    f   NaN   4.0
6    b    g   4.0   NaN

print (df['colA'].value_counts())
b    3
c    2
a    1
Name: colA, dtype: int64

print (df.groupby('colA').count())
      colB  colC  colD
colA                  
a        1     1     1
b        3     2     2
c        2     2     1

print (df.groupby('colA')['colA'].count())
colA
a    1
b    3
c    2
Name: colA, dtype: int64

0 讨论(0)

查看其它2个回答