If a sklearn.LabelEncoder has been fitted on a training set, it might break if it encounters new values when used on a test set.
The only solution I c
I know two devs that are working on building wrappers around transformers and Sklearn pipelines. They have 2 robust encoder transformers (one dummy and one label encoders) that can handle unseen values. Here is the documentation to their skutil library. Search for skutil.preprocessing.OneHotCategoricalEncoder or skutil.preprocessing.SafeLabelEncoder. In their SafeLabelEncoder(), unseen values are auto encoded to 999999.