Label encoding across multiple columns in scikit-learn

后端 未结 22 2462
礼貌的吻别
礼貌的吻别 2020-11-22 09:02

I\'m trying to use scikit-learn\'s LabelEncoder to encode a pandas DataFrame of string labels. As the dataframe has many (50+) columns, I want to a

22条回答
  •  星月不相逢
    2020-11-22 09:42

    We don't need a LabelEncoder.

    You can convert the columns to categoricals and then get their codes. I used a dictionary comprehension below to apply this process to every column and wrap the result back into a dataframe of the same shape with identical indices and column names.

    >>> pd.DataFrame({col: df[col].astype('category').cat.codes for col in df}, index=df.index)
       location  owner  pets
    0         1      1     0
    1         0      2     1
    2         0      0     0
    3         1      1     2
    4         1      3     1
    5         0      2     1
    

    To create a mapping dictionary, you can just enumerate the categories using a dictionary comprehension:

    >>> {col: {n: cat for n, cat in enumerate(df[col].astype('category').cat.categories)} 
         for col in df}
    
    {'location': {0: 'New_York', 1: 'San_Diego'},
     'owner': {0: 'Brick', 1: 'Champ', 2: 'Ron', 3: 'Veronica'},
     'pets': {0: 'cat', 1: 'dog', 2: 'monkey'}}
    

提交回复
热议问题