I\'ve heard in Pandas there\'s often multiple ways to do the same thing, but I was wondering –
If I\'m trying to group data by a value within a specific column and
There is difference value_counts return:
The resulting object will be in descending order so that the first element is the most frequently-occurring element.
but count not, it sort output by index (created by column in groupby('col')).
df.groupby('colA').count()
is for aggregate all columns of df by function count. So it count values excluding NaNs.
So if need count only one column need:
df.groupby('colA')['colA'].count()
Sample:
df = pd.DataFrame({'colB':list('abcdefg'),
'colC':[1,3,5,7,np.nan,np.nan,4],
'colD':[np.nan,3,6,9,2,4,np.nan],
'colA':['c','c','b','a',np.nan,'b','b']})
print (df)
colA colB colC colD
0 c a 1.0 NaN
1 c b 3.0 3.0
2 b c 5.0 6.0
3 a d 7.0 9.0
4 NaN e NaN 2.0
5 b f NaN 4.0
6 b g 4.0 NaN
print (df['colA'].value_counts())
b 3
c 2
a 1
Name: colA, dtype: int64
print (df.groupby('colA').count())
colB colC colD
colA
a 1 1 1
b 3 2 2
c 2 2 1
print (df.groupby('colA')['colA'].count())
colA
a 1
b 3
c 2
Name: colA, dtype: int64