If a sklearn.LabelEncoder has been fitted on a training set, it might break if it encounters new values when used on a test set.
The only solution I c
I ended up switching to Pandas' get_dummies due to this problem of unseen data.
dummy_train = pd.get_dummies(train)dummy_new = pd.get_dummies(new_data)dummy_new.reindex(columns = dummy_train.columns, fill_value=0)Effectively any new features which are categorical will not go into the classifier, but I think that should not cause problems as it would not know what to do with them.