Put customized functions in Sklearn pipeline

后端 未结 2 2063
礼貌的吻别
礼貌的吻别 2020-12-30 14:52

In my classification scheme, there are several steps including:

  1. SMOTE (Synthetic Minority Over-sampling Technique)
  2. Fisher criteria for feature selecti
2条回答
  •  孤独总比滥情好
    2020-12-30 15:20

    scikit created a FunctionTransformer as part of the preprocessing class in version 0.17. It can be used in a similar manner as David's implementation of the class Fisher in the answer above - but with less flexibility. If the input/output of the function is configured properly, the transformer can implement the fit/transform/fit_transform methods for the function and thus allow it to be used in the scikit pipeline.

    For example, if the input to a pipeline is a series, the transformer would be as follows:

    def trans_func(input_series):
    return output_series
    
    from sklearn.preprocessing import FunctionTransformer
    transformer = FunctionTransformer(trans_func)
    
    sk_pipe = Pipeline([("trans", transformer), ("vect", tf_1k), ("clf", clf_1k)])
    sk_pipe.fit(train.desc, train.tag)
    

    where vect is a tf_idf transformer, clf is a classifier and train is the training dataset. "train.desc" is the series text input to the pipeline.

提交回复
热议问题