If a sklearn.LabelEncoder
has been fitted on a training set, it might break if it encounters new values when used on a test set.
The only solution I c
LabelEncoder is basically a dictionary. You can extract and use it for future encoding:
from sklearn.preprocessing import LabelEncoder
le = preprocessing.LabelEncoder()
le.fit(X)
le_dict = dict(zip(le.classes_, le.transform(le.classes_)))
Retrieve label for a single new item, if item is missing then set value as unknown
le_dict.get(new_item, '')
Retrieve labels for a Dataframe column:
df[your_col] = df[your_col].apply(lambda x: le_dict.get(x, ))