Label encoding across multiple columns in scikit-learn

后端 未结 22 2296
礼貌的吻别
礼貌的吻别 2020-11-22 09:02

I\'m trying to use scikit-learn\'s LabelEncoder to encode a pandas DataFrame of string labels. As the dataframe has many (50+) columns, I want to a

22条回答
  •  萌比男神i
    2020-11-22 09:26

    It is possible to do this all in pandas directly and is well-suited for a unique ability of the replace method.

    First, let's make a dictionary of dictionaries mapping the columns and their values to their new replacement values.

    transform_dict = {}
    for col in df.columns:
        cats = pd.Categorical(df[col]).categories
        d = {}
        for i, cat in enumerate(cats):
            d[cat] = i
        transform_dict[col] = d
    
    transform_dict
    {'location': {'New_York': 0, 'San_Diego': 1},
     'owner': {'Brick': 0, 'Champ': 1, 'Ron': 2, 'Veronica': 3},
     'pets': {'cat': 0, 'dog': 1, 'monkey': 2}}
    

    Since this will always be a one to one mapping, we can invert the inner dictionary to get a mapping of the new values back to the original.

    inverse_transform_dict = {}
    for col, d in transform_dict.items():
        inverse_transform_dict[col] = {v:k for k, v in d.items()}
    
    inverse_transform_dict
    {'location': {0: 'New_York', 1: 'San_Diego'},
     'owner': {0: 'Brick', 1: 'Champ', 2: 'Ron', 3: 'Veronica'},
     'pets': {0: 'cat', 1: 'dog', 2: 'monkey'}}
    

    Now, we can use the unique ability of the replace method to take a nested list of dictionaries and use the outer keys as the columns, and the inner keys as the values we would like to replace.

    df.replace(transform_dict)
       location  owner  pets
    0         1      1     0
    1         0      2     1
    2         0      0     0
    3         1      1     2
    4         1      3     1
    5         0      2     1
    

    We can easily go back to the original by again chaining the replace method

    df.replace(transform_dict).replace(inverse_transform_dict)
        location     owner    pets
    0  San_Diego     Champ     cat
    1   New_York       Ron     dog
    2   New_York     Brick     cat
    3  San_Diego     Champ  monkey
    4  San_Diego  Veronica     dog
    5   New_York       Ron     dog
    

提交回复
热议问题