How to do Onehotencoding in Sklearn Pipeline

后端 未结 1 761
温柔的废话
温柔的废话 2021-02-05 21:19

I am trying to oneHotEncode the categorical variables of my Pandas dataframe, which includes both categorical and continues variables. I realise this can be done easily with the

相关标签:
1条回答
  • 2021-02-05 21:50

    OneHotEncoder doesn't support string features, and with [(d, OneHotEncoder()) for d in dummies] you are applying it to all dummies columns. Use LabelBinarizer instead:

    mapper = DataFrameMapper(
        [(d, LabelBinarizer()) for d in dummies]
    )
    

    An alternative would be to use the LabelEncoder with a second OneHotEncoder step.

    mapper = DataFrameMapper(
        [(d, LabelEncoder()) for d in dummies]
    )
    
    lm = PMMLPipeline([("mapper", mapper),
                       ("onehot", OneHotEncoder()),
                       ("regressor", LinearRegression())])
    
    0 讨论(0)
提交回复
热议问题