Sklearn Label Encoding multiple columns pandas dataframe

后端 未结 5 1064
春和景丽
春和景丽 2020-12-10 03:25

I try to encode a number of columns containing categorical data (\"Yes\" and \"No\") in a large pandas dataframe. The complete dataframe contains

5条回答
  •  北海茫月
    2020-12-10 03:59

    As the following code, you can encode the multiple columns by applying LabelEncoder to DataFrame. However, please note that we cannot obtain the classes information for all columns.

    import pandas as pd
    from sklearn.preprocessing import LabelEncoder
    
    df = pd.DataFrame({'A': [1, 2, 3, 4],
                       'B': ["Yes", "No", "Yes", "Yes"],
                       'C': ["Yes", "No", "No", "Yes"],
                       'D': ["No", "Yes", "No", "Yes"]})
    print(df)
    #    A    B    C    D
    # 0  1  Yes  Yes   No
    # 1  2   No   No  Yes
    # 2  3  Yes   No   No
    # 3  4  Yes  Yes  Yes
    
    # LabelEncoder
    le = LabelEncoder()
    
    # apply "le.fit_transform"
    df_encoded = df.apply(le.fit_transform)
    print(df_encoded)
    #    A  B  C  D
    # 0  0  1  1  0
    # 1  1  0  0  1
    # 2  2  1  0  0
    # 3  3  1  1  1
    
    # Note: we cannot obtain the classes information for all columns.
    print(le.classes_)
    # ['No' 'Yes']
    

提交回复
热议问题