Scikit NaN or infinity error message

两盒软妹~` 提交于 2019-12-03 21:41:18

scikit-learn's decision trees cast their input to float32 for efficiency, but your values won't fit in that type:

>>> np.float32(8.9932064170227995e+41)
inf

The solution is to standardize prior to fitting a model with sklearn.preprocessing.StandardScaler. Don't forget to transform prior to predicting. You can use a sklearn.pipeline.Pipeline to combine standardization and classification in a single object:

rf = Pipeline([("scale", StandardScaler()),
               ("rf", RandomForestClassifier(n_estimators=100, n_jobs=-1, verbose=2))])

Or, with the current dev version/next release:

rf = make_pipeline(StandardScaler(),
                   RandomForestClassifier(n_estimators=100, n_jobs=-1, verbose=2))

(I admit the error message could be improved.)

I come across this problem as well. But on the contrary, my problem is that there are some 'NaN' in the array.

Here is how to fix it.

from sklearn.preprocessing import Imputer
X = Imputer().fit_transform(X)
RF.fit(X, y)

Reference here: sklearn.preprocessing.Imputer

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!