Pipeline: Multiple classifiers?

前端未结

关注

 3  1546

爱一瞬间的悲伤 2020-12-09 12:14

I read following example on Pipelines and GridSearchCV in Python: http://www.davidsbatista.net/blog/2017/04/01/document_classification/

Logistic Regression:<

3条回答

借酒劲吻你 (楼主)

2020-12-09 12:52

Here is an easy way to optimize over any classifier and for each classifier any settings of parameters.

Create a switcher class that works for any estimator

from sklearn.base import BaseEstimator class ClfSwitcher(BaseEstimator): def __init__( self, estimator = SGDClassifier(), ): """ A Custom BaseEstimator that can switch between classifiers. :param estimator: sklearn object - The classifier """ self.estimator = estimator def fit(self, X, y=None, **kwargs): self.estimator.fit(X, y) return self def predict(self, X, y=None): return self.estimator.predict(X) def predict_proba(self, X): return self.estimator.predict_proba(X) def score(self, X, y): return self.estimator.score(X, y)

Now you can pass in anything for the estimator parameter. And you can optimize any parameter for any estimator you pass in as follows:

Perform hyper-parameter optimization

from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.naive_bayes import MultinomialNB from sklearn.linear_model import SGDClassifier from sklearn.pipeline import Pipeline from sklearn.model_selection import GridSearchCV pipeline = Pipeline([ ('tfidf', TfidfVectorizer()), ('clf', ClfSwitcher()), ]) parameters = [ { 'clf__estimator': [SGDClassifier()], # SVM if hinge loss / logreg if log loss 'tfidf__max_df': (0.25, 0.5, 0.75, 1.0), 'tfidf__stop_words': ['english', None], 'clf__estimator__penalty': ('l2', 'elasticnet', 'l1'), 'clf__estimator__max_iter': [50, 80], 'clf__estimator__tol': [1e-4], 'clf__estimator__loss': ['hinge', 'log', 'modified_huber'], }, { 'clf__estimator': [MultinomialNB()], 'tfidf__max_df': (0.25, 0.5, 0.75, 1.0), 'tfidf__stop_words': [None], 'clf__estimator__alpha': (1e-2, 1e-3, 1e-1), }, ] gscv = GridSearchCV(pipeline, parameters, cv=5, n_jobs=12, return_train_score=False, verbose=3) gscv.fit(train_data, train_labels)

How to interpret clf__estimator__loss

clf__estimator__loss is interpreted as the loss parameter for whatever estimator is, where estimator = SGDClassifier() in the top most example and is itself a parameter of clf which is a ClfSwitcher object.

0 讨论(0)

查看其它3个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复

Pipeline: Multiple classifiers?

Create a switcher class that works for any estimator

Perform hyper-parameter optimization

How to interpret clf__estimator__loss

How to interpret `clfestimatorloss`