发表新帖

发表新帖

sklearn.LabelEncoder with never seen before values

后端未结

关注

 12  994

执笔经年 2020-11-27 10:37

If a sklearn.LabelEncoder has been fitted on a training set, it might break if it encounters new values when used on a test set.

The only solution I c

12条回答

眼角桃花 (楼主)

2020-11-27 11:09
I ended up switching to Pandas' get_dummies due to this problem of unseen data.
- create the dummies on the training data
  dummy_train = pd.get_dummies(train)
- create the dummies in the new (unseen data)
  dummy_new = pd.get_dummies(new_data)
- re-index the new data to the columns of the training data, filling the missing values with 0
  dummy_new.reindex(columns = dummy_train.columns, fill_value=0)
Effectively any new features which are categorical will not go into the classifier, but I think that should not cause problems as it would not know what to do with them.
0 讨论(0)

查看其它12个回答
发布评论:

提交评论
- 加载中...

热议问题