I\'ve got pandas data with some columns of text type. There are some NaN values along with these text columns. What I\'m trying to do is to impute those NaN\'s by skle
This code fills in a series with the most frequent category:
import pandas as pd
import numpy as np
# create fake data
m = pd.Series(list('abca'))
m.iloc[1] = np.nan #artificially introduce nan
print('m = ')
print(m)
#make dummy variables, count and sort descending:
most_common = pd.get_dummies(m).sum().sort_values(ascending=False).index[0]
def replace_most_common(x):
if pd.isnull(x):
return most_common
else:
return x
new_m = m.map(replace_most_common) #apply function to original data
print('new_m = ')
print(new_m)
Outputs:
m =
0 a
1 NaN
2 c
3 a
dtype: object
new_m =
0 a
1 a
2 c
3 a
dtype: object