The question is how to fill NaNs with most frequent levels for category column in pandas dataframe?
In R randomForest package there is
na.roughfix option : A
Most of the time, you wouldn't want the same imputing strategy for all the columns. For example, you may want column mode for categorical variables and column mean or median for numeric columns.
For example:
df = pd.DataFrame({'num': [1.,2.,4.,np.nan],'cate1':['a','a','b',np.nan],'cate2':['a','b','b',np.nan]})
# numeric columns
>>> df.fillna(df.select_dtypes(include='number').mean().iloc[0], inplace=True)
# categorical columns
>>> df.fillna(df.select_dtypes(include='object').mode().iloc[0], inplace=True)
>>> print(df)
num cate1 cate2
0 1.000 a a
1 2.000 a b
2 4.000 b b
3 2.333 a b