How to reverse Label Encoder from sklearn for multiple columns?

前端未结

关注

 2  1776

再見小時候 2021-01-07 08:49

I would like to use the inverse_transform function for LabelEncoder on multiple columns.

This is the code I use for more than one columns when applying LabelEncoder

2条回答

无人及你 (楼主)

2021-01-07 09:08

In order to inverse transform the data you need to remember the encoders that were used to transform every column. A possible way to do this is to save the LabelEncoders in a dict inside your object. The way it would work:

when you call fit the encoders for every column are fit and saved
when you call transform they get used to transform data
when you call inverse_transform they get used to do the inverse transformation

Example code:

class MultiColumnLabelEncoder:

    def __init__(self, columns=None):
        self.columns = columns # array of column names to encode


    def fit(self, X, y=None):
        self.encoders = {}
        columns = X.columns if self.columns is None else self.columns
        for col in columns:
            self.encoders[col] = LabelEncoder().fit(X[col])
        return self


    def transform(self, X):
        output = X.copy()
        columns = X.columns if self.columns is None else self.columns
        for col in columns:
            output[col] = self.encoders[col].transform(X[col])
        return output


    def fit_transform(self, X, y=None):
        return self.fit(X,y).transform(X)


    def inverse_transform(self, X):
        output = X.copy()
        columns = X.columns if self.columns is None else self.columns
        for col in columns:
            output[col] = self.encoders[col].inverse_transform(X[col])
        return output

You can then use it like this:

multi = MultiColumnLabelEncoder(columns=['city','size'])
df = pd.DataFrame({'city':    ['London','Paris','Moscow'],
                   'size':    ['M',     'M',    'L'],
                   'quantity':[12,       1,      4]})
X = multi.fit_transform(df)
print(X)
#    city  size  quantity
# 0     0     1        12
# 1     2     1         1
# 2     1     0         4
inv = multi.inverse_transform(X)
print(inv)
#      city size  quantity
# 0  London    M        12
# 1   Paris    M         1
# 2  Moscow    L         4

There could be a separate implementation of fit_transform that would call the same method of LabelEncoders. Just make sure to keep the encoders around for when you need the inverse transformation.

0 讨论(0)

查看其它2个回答