问题
I'm using sklearn.pipeline.Pipeline
to chain feature extractors and a classifier. Is there a way to combine multiple feature selection classes (for example the ones from sklearn.feature_selection.text
) in parallel and join their output?
My code right now looks as follows:
pipeline = Pipeline([
('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', SGDClassifier())])
It results in the following:
vect -> tfidf -> clf
I want to be able to specify a pipeline that looks as follows:
vect1 -> tfidf1 \
-> clf
vect2 -> tfidf2 /
回答1:
This has been implemented recently in the master branch of scikit-learn under the name FeatureUnion
:
http://scikit-learn.org/dev/modules/pipeline.html#feature-union
来源:https://stackoverflow.com/questions/12721486/combining-feature-extraction-classes-in-scikit-learn