Label encoding across multiple columns in scikit-learn

后端 未结 22 2467
礼貌的吻别
礼貌的吻别 2020-11-22 09:02

I\'m trying to use scikit-learn\'s LabelEncoder to encode a pandas DataFrame of string labels. As the dataframe has many (50+) columns, I want to a

22条回答
  •  生来不讨喜
    2020-11-22 09:49

    After lots of search and experimentation with some answers here and elsewhere, I think your answer is here:

    pd.DataFrame(columns=df.columns, data=LabelEncoder().fit_transform(df.values.flatten()).reshape(df.shape))

    This will preserve category names across columns:

    import pandas as pd
    from sklearn.preprocessing import LabelEncoder
    
    df = pd.DataFrame([['A','B','C','D','E','F','G','I','K','H'],
                       ['A','E','H','F','G','I','K','','',''],
                       ['A','C','I','F','H','G','','','','']], 
                      columns=['A1', 'A2', 'A3','A4', 'A5', 'A6', 'A7', 'A8', 'A9', 'A10'])
    
    pd.DataFrame(columns=df.columns, data=LabelEncoder().fit_transform(df.values.flatten()).reshape(df.shape))
    
        A1  A2  A3  A4  A5  A6  A7  A8  A9  A10
    0   1   2   3   4   5   6   7   9   10  8
    1   1   5   8   6   7   9   10  0   0   0
    2   1   3   9   6   8   7   0   0   0   0
    

提交回复
热议问题