Label encoding across multiple columns in scikit-learn

后端 未结 22 2303
礼貌的吻别
礼貌的吻别 2020-11-22 09:02

I\'m trying to use scikit-learn\'s LabelEncoder to encode a pandas DataFrame of string labels. As the dataframe has many (50+) columns, I want to a

22条回答
  •  滥情空心
    2020-11-22 09:39

    Since scikit-learn 0.20 you can use sklearn.compose.ColumnTransformer and sklearn.preprocessing.OneHotEncoder:

    If you only have categorical variables, OneHotEncoder directly:

    from sklearn.preprocessing import OneHotEncoder
    
    OneHotEncoder(handle_unknown='ignore').fit_transform(df)
    

    If you have heterogeneously typed features:

    from sklearn.compose import make_column_transformer
    from sklearn.preprocessing import RobustScaler
    from sklearn.preprocessing import OneHotEncoder
    
    categorical_columns = ['pets', 'owner', 'location']
    numerical_columns = ['age', 'weigth', 'height']
    column_trans = make_column_transformer(
        (categorical_columns, OneHotEncoder(handle_unknown='ignore'),
        (numerical_columns, RobustScaler())
    column_trans.fit_transform(df)
    

    More options in the documentation: http://scikit-learn.org/stable/modules/compose.html#columntransformer-for-heterogeneous-data

提交回复
热议问题