Predicting missing values with scikit-learn's Imputer module

那年仲夏 提交于 2020-01-01 01:39:09

问题


I am writing a very basic program to predict missing values in a dataset using scikit-learn's Imputer class.

I have made a NumPy array, created an Imputer object with strategy='mean' and performed fit_transform() on the NumPy array.

When I print the array after performing fit_transform(), the 'Nan's remain, and I dont get any prediction.

What am I doing wrong here? How do I go about predicting the missing values?

import numpy as np
from sklearn.preprocessing import Imputer

X = np.array([[23.56],[53.45],['NaN'],[44.44],[77.78],['NaN'],[234.44],[11.33],[79.87]])

print X

imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
imp.fit_transform(X)

print X

回答1:


Per the documentation, sklearn.preprocessing.Imputer.fit_transform returns a new array, it doesn't alter the argument array. The minimal fix is therefore:

X = imp.fit_transform(X)



回答2:


As the new array is returned from the transform function, therefore, I have to store it in the same array (X) to alter the values

 from sklearn.preprocessing import Imputer
 imputer = Imputer(missing_values='NaN',strategy='mean',axis=0)  
 imputer = imputer.fit(X[:,1:3])
 X[:,1:3]= imputer.transform(X[:,1:3])



回答3:


After scikit-learn version 0.20 impute module using changed. So now we use imputer like;

from sklearn.impute import SimpleImputer
impute = SimpleImputer(missing_values=np.nan, strategy='mean')
impute.fit(X)
X=impute.transform(X)

Pay attention:

Instead of 'NaN', np.nan is used

Don't need to use axis parameter

We can use imp or imputer instead of my impute variable



来源:https://stackoverflow.com/questions/25017626/predicting-missing-values-with-scikit-learns-imputer-module

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!