How can I standardize only numeric variables in an sklearn pipeline?

前端 未结 3 1138
后悔当初
后悔当初 2020-12-06 17:21

I am trying to create an sklearn pipeline with 2 steps:

  1. Standardize the data
  2. Fit the data using KNN

However, my data has both numeric a

3条回答
  •  失恋的感觉
    2020-12-06 17:26

    Since you have converted your categorical features into dummies using pd.get_dummies, so you don't need to use OneHotEncoder. As a result, your pipeline should be:

    from sklearn.preprocessing import StandardScaler,FunctionTransformer
    from sklearn.pipeline import Pipeline,FeatureUnion
    
    knn=KNeighborsClassifier()
    
    pipeline=Pipeline(steps= [
        ('feature_processing', FeatureUnion(transformer_list = [
                ('categorical', FunctionTransformer(lambda data: data[:, cat_indices])),
    
                #numeric
                ('numeric', Pipeline(steps = [
                    ('select', FunctionTransformer(lambda data: data[:, num_indices])),
                    ('scale', StandardScaler())
                            ]))
            ])),
        ('clf', knn)
        ]
    )
    

提交回复
热议问题