sklearn.LabelEncoder with never seen before values

后端 未结 12 994
执笔经年
执笔经年 2020-11-27 10:37

If a sklearn.LabelEncoder has been fitted on a training set, it might break if it encounters new values when used on a test set.

The only solution I c

12条回答
  •  眼角桃花
    2020-11-27 11:09

    I ended up switching to Pandas' get_dummies due to this problem of unseen data.

    • create the dummies on the training data
      dummy_train = pd.get_dummies(train)
    • create the dummies in the new (unseen data)
      dummy_new = pd.get_dummies(new_data)
    • re-index the new data to the columns of the training data, filling the missing values with 0
      dummy_new.reindex(columns = dummy_train.columns, fill_value=0)

    Effectively any new features which are categorical will not go into the classifier, but I think that should not cause problems as it would not know what to do with them.

提交回复
热议问题