Using Scikit's LabelEncoder correctly across multiple programs

后端 未结 5 570
无人共我
无人共我 2020-12-02 17:32

The basic task that I have at hand is

a) Read some tab separated data.

b) Do some basic preprocessing

c) For each categorical column use LabelE

5条回答
  •  温柔的废话
    2020-12-02 17:43

    What works for me is LabelEncoder().fit(X_train[col]), pickling these objects for each categorical column col and then reusing the same objects for transforming the same categorical column col in the validation dataset. Basically you have a label encoder object for each of your categorical columns.

    1. So fit() on training data and pickle the objects/models corresponding to each column in the training dataframe X_train.
    2. For each col in columns of validation set X_cv, load the corresponding object/model and apply the transformation by accessing the transform function as: transform(X_cv[col]).

提交回复
热议问题