I\'ve got pandas data with some columns of text type. There are some NaN values along with these text columns. What I\'m trying to do is to impute those NaN\'s by skle
Similar. Modify Imputer for strategy='most_frequent':
class GeneralImputer(Imputer):
def __init__(self, **kwargs):
Imputer.__init__(self, **kwargs)
def fit(self, X, y=None):
if self.strategy == 'most_frequent':
self.fills = pd.DataFrame(X).mode(axis=0).squeeze()
self.statistics_ = self.fills.values
return self
else:
return Imputer.fit(self, X, y=y)
def transform(self, X):
if hasattr(self, 'fills'):
return pd.DataFrame(X).fillna(self.fills).values.astype(str)
else:
return Imputer.transform(self, X)
where pandas.DataFrame.mode() finds the most frequent value for each column and then pandas.DataFrame.fillna() fills missing values with these. Other strategy values are still handled the same way by Imputer.