sklearn.LabelEncoder with never seen before values

后端 未结 12 990
执笔经年
执笔经年 2020-11-27 10:37

If a sklearn.LabelEncoder has been fitted on a training set, it might break if it encounters new values when used on a test set.

The only solution I c

12条回答
  •  余生分开走
    2020-11-27 11:10

    LabelEncoder is basically a dictionary. You can extract and use it for future encoding:

    from sklearn.preprocessing import LabelEncoder
    
    le = preprocessing.LabelEncoder()
    le.fit(X)
    
    le_dict = dict(zip(le.classes_, le.transform(le.classes_)))
    

    Retrieve label for a single new item, if item is missing then set value as unknown

    le_dict.get(new_item, '')
    

    Retrieve labels for a Dataframe column:

    df[your_col] = df[your_col].apply(lambda x: le_dict.get(x, ))
    

提交回复
热议问题