发表新帖

发表新帖

replace missing values in categorical data

后端未结

关注

 3  1315

孤独总比滥情好 2021-01-27 00:00

Let\'s suppose I have a column with categorical data \"red\" \"green\" \"blue\" and empty cells

red
green
red
blue
NaN

I\'m sure that the NaN b

3条回答

没有蜡笔的小新 (楼主)

2021-01-27 00:44
The simplest strategy for handling missing data is to remove records that contain a missing value.

The scikit-learn library provides the Imputer() pre-processing class that can be used to replace missing values. Since it is categorical data, using mean as replacement value is not recommended. You can use
```
from sklearn.preprocessing import Imputer
imp = Imputer(missing_values='NaN', strategy='most_frequent', axis=0)
```
The Imputer class operates directly on the NumPy array instead of the DataFrame.

Last but not least, not ALL ML algorithm cannot handle missing value. Different implementations of ML also different.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题