The question is how to fill NaNs with most frequent levels for category column in pandas dataframe?
In R randomForest package there is
na.roughfix option : A
In more recent versions of scikit-learn up you can use SimpleImputer to impute both numerics and categoricals:
import pandas as pd
from sklearn.impute import SimpleImputer
arr = [[1., 'x'], [np.nan, 'y'], [7., 'z'], [7., 'y'], [4., np.nan]]
df1 = pd.DataFrame({'x1': [x[0] for x in arr],
'x2': [x[1] for x in arr]},
index=[l for l in 'abcde'])
imp = SimpleImputer(missing_values=np.nan, strategy='most_frequent')
print(pd.DataFrame(imp.fit_transform(df1),
columns=df1.columns,
index=df1.index))
# x1 x2
# a 1 x
# b 7 y
# c 7 z
# d 7 y
# e 4 y