Impute categorical missing values in scikit-learn

后端 未结 10 1423
清歌不尽
清歌不尽 2020-11-30 16:55

I\'ve got pandas data with some columns of text type. There are some NaN values along with these text columns. What I\'m trying to do is to impute those NaN\'s by skle

10条回答
  •  难免孤独
    2020-11-30 17:43

    Similar. Modify Imputer for strategy='most_frequent':

    class GeneralImputer(Imputer):
        def __init__(self, **kwargs):
            Imputer.__init__(self, **kwargs)
    
        def fit(self, X, y=None):
            if self.strategy == 'most_frequent':
                self.fills = pd.DataFrame(X).mode(axis=0).squeeze()
                self.statistics_ = self.fills.values
                return self
            else:
                return Imputer.fit(self, X, y=y)
    
        def transform(self, X):
            if hasattr(self, 'fills'):
                return pd.DataFrame(X).fillna(self.fills).values.astype(str)
            else:
                return Imputer.transform(self, X)
    

    where pandas.DataFrame.mode() finds the most frequent value for each column and then pandas.DataFrame.fillna() fills missing values with these. Other strategy values are still handled the same way by Imputer.

提交回复
热议问题