replace missing values in categorical data

后端 未结 3 1315
孤独总比滥情好
孤独总比滥情好 2021-01-27 00:00

Let\'s suppose I have a column with categorical data \"red\" \"green\" \"blue\" and empty cells

red
green
red
blue
NaN

I\'m sure that the NaN b

3条回答
  •  没有蜡笔的小新
    2021-01-27 00:44

    The simplest strategy for handling missing data is to remove records that contain a missing value.

    The scikit-learn library provides the Imputer() pre-processing class that can be used to replace missing values. Since it is categorical data, using mean as replacement value is not recommended. You can use

    from sklearn.preprocessing import Imputer
    imp = Imputer(missing_values='NaN', strategy='most_frequent', axis=0)
    

    The Imputer class operates directly on the NumPy array instead of the DataFrame.

    Last but not least, not ALL ML algorithm cannot handle missing value. Different implementations of ML also different.

提交回复
热议问题