sklearn pipeline - how to apply different transformations on different columns

后端 未结 2 589
天涯浪人
天涯浪人 2020-12-01 12:53

I am pretty new to pipelines in sklearn and I am running into this problem: I have a dataset that has a mixture of text and numbers i.e. certain columns have text only and r

相关标签:
2条回答
  • 2020-12-01 13:14

    The way I usually do it is with a FeatureUnion, using a FunctionTransformer to pull out the relevant columns.

    Important notes:

    • You have to define your functions with def since annoyingly you can't use lambda or partial in FunctionTransformer if you want to pickle your model

    • You need to initialize FunctionTransformer with validate=False

    Something like this:

    from sklearn.pipeline import make_union, make_pipeline
    from sklearn.preprocessing import FunctionTransformer
    
    def get_text_cols(df):
        return df[['name', 'fruit']]
    
    def get_num_cols(df):
        return df[['height','age']]
    
    vec = make_union(*[
        make_pipeline(FunctionTransformer(get_text_cols, validate=False), LabelEncoder()))),
        make_pipeline(FunctionTransformer(get_num_cols, validate=False), MinMaxScaler())))
    ])
    
    0 讨论(0)
  • 2020-12-01 13:18

    Since v0.20, you can use ColumnTransformer to accomplish this.

    0 讨论(0)
提交回复
热议问题