I am trying to oneHotEncode the categorical variables of my Pandas dataframe, which includes both categorical and continues variables. I realise this can be done easily with the
OneHotEncoder
doesn't support string features, and with [(d, OneHotEncoder()) for d in dummies]
you are applying it to all dummies columns. Use LabelBinarizer
instead:
mapper = DataFrameMapper(
[(d, LabelBinarizer()) for d in dummies]
)
An alternative would be to use the LabelEncoder
with a second OneHotEncoder
step.
mapper = DataFrameMapper(
[(d, LabelEncoder()) for d in dummies]
)
lm = PMMLPipeline([("mapper", mapper),
("onehot", OneHotEncoder()),
("regressor", LinearRegression())])