I am trying to oneHotEncode the categorical variables of my Pandas dataframe, which includes both categorical and continues variables. I realise this can be done easily with the
OneHotEncoder doesn't support string features, and with [(d, OneHotEncoder()) for d in dummies] you are applying it to all dummies columns. Use LabelBinarizer instead:
mapper = DataFrameMapper(
[(d, LabelBinarizer()) for d in dummies]
)
An alternative would be to use the LabelEncoder with a second OneHotEncoder step.
mapper = DataFrameMapper(
[(d, LabelEncoder()) for d in dummies]
)
lm = PMMLPipeline([("mapper", mapper),
("onehot", OneHotEncoder()),
("regressor", LinearRegression())])