The basic task that I have at hand is
a) Read some tab separated data.
b) Do some basic preprocessing
c) For each categorical column use LabelE
What works for me is LabelEncoder().fit(X_train[col])
, pickling these objects for each categorical column col
and then reusing the same objects for transforming the same categorical column col
in the validation dataset. Basically you have a label encoder object for each of your categorical columns.
fit()
on training data and pickle the objects/models corresponding to each column in the training dataframe X_train
. col
in columns of validation set X_cv
, load the corresponding object/model and apply the transformation by accessing the transform function as: transform(X_cv[col])
.