Python sklearn - Determine the encoding order of LabelEncoder

时光怂恿深爱的人放手 提交于 2019-12-11 16:54:19

问题


I wish to determine the labels of sklearn LabelEncoder (namely 0,1,2,3,...) to fit a specific order of the possible values of categorical variable (say ['b', 'a', 'c', 'd' ]). LabelEncoder chooses to fit the labels lexicographically I guess as can be seen in this example:

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
le.fit(['b', 'a', 'c', 'd' ])
le.classes_
array(['a', 'b', 'c', 'd'], dtype='<U1')
le.transform(['a', 'b'])
array([0, 1])

How can I force the encoder to stick to the order of data as it is first met in the .fit method (namely to encode 'b' to 0, 'a' to 1, 'c' to 2, and 'd' to 3)?


回答1:


You cannot do that in original one.

LabelEncoder.fit() uses numpy.unique which will always return the data as sorted, as given in source:

def fit(...):
    y = column_or_1d(y, warn=True)
    self.classes_ = np.unique(y)
    return self

So if you want to do that, you need to override the fit() function. Something like this:

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.utils import column_or_1d

class MyLabelEncoder(LabelEncoder):

    def fit(self, y):
        y = column_or_1d(y, warn=True)
        self.classes_ = pd.Series(y).unique()
        return self

Then you can do this:

le = MyLabelEncoder()
le.fit(['b', 'a', 'c', 'd' ])
le.classes_
#Output:  array(['b', 'a', 'c', 'd'], dtype=object)

Here, I am using pandas.Series.unique(), to get unique classes. If you cannot use pandas for any reason, refer to this question which does this question using numpy:

  • numpy unique without sort



回答2:


Vivek Kumar solution worked for me, but had to do it this way

class LabelEncoder(LabelEncoder):

def fit(self, y):
    y = column_or_1d(y, warn=True)
    self.classes_ = pd.Series(y).unique().sort()
    return self


来源:https://stackoverflow.com/questions/51308994/python-sklearn-determine-the-encoding-order-of-labelencoder

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!