I am trying to create an sklearn pipeline with 2 steps:
However, my data has both numeric a
Since you have converted your categorical features into dummies using pd.get_dummies
, so you don't need to use OneHotEncoder
. As a result, your pipeline should be:
from sklearn.preprocessing import StandardScaler,FunctionTransformer
from sklearn.pipeline import Pipeline,FeatureUnion
knn=KNeighborsClassifier()
pipeline=Pipeline(steps= [
('feature_processing', FeatureUnion(transformer_list = [
('categorical', FunctionTransformer(lambda data: data[:, cat_indices])),
#numeric
('numeric', Pipeline(steps = [
('select', FunctionTransformer(lambda data: data[:, num_indices])),
('scale', StandardScaler())
]))
])),
('clf', knn)
]
)