How to do Onehotencoding in Sklearn Pipeline

后端 未结 1 760
温柔的废话
温柔的废话 2021-02-05 21:19

I am trying to oneHotEncode the categorical variables of my Pandas dataframe, which includes both categorical and continues variables. I realise this can be done easily with the

1条回答
  •  Happy的楠姐
    2021-02-05 21:50

    OneHotEncoder doesn't support string features, and with [(d, OneHotEncoder()) for d in dummies] you are applying it to all dummies columns. Use LabelBinarizer instead:

    mapper = DataFrameMapper(
        [(d, LabelBinarizer()) for d in dummies]
    )
    

    An alternative would be to use the LabelEncoder with a second OneHotEncoder step.

    mapper = DataFrameMapper(
        [(d, LabelEncoder()) for d in dummies]
    )
    
    lm = PMMLPipeline([("mapper", mapper),
                       ("onehot", OneHotEncoder()),
                       ("regressor", LinearRegression())])
    

    0 讨论(0)
提交回复
热议问题