Pipeline: Multiple classifiers?

前端未结

关注

 3  1544

I read following example on Pipelines and GridSearchCV in Python: http://www.davidsbatista.net/blog/2017/04/01/document_classification/

Logistic Regression:<

相关标签:

3条回答

傲寒

2020-12-09 12:48

This is how I did it without a wrapper function. You can evaluate any number of classifiers. Each one can have multiple parameters for hyperparameter optimization.

The one with best score will be saved to disk using pickle

from sklearn.svm import SVC from operator import itemgetter from sklearn.utils import shuffle from sklearn.pipeline import Pipeline from sklearn.naive_bayes import MultinomialNB from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import GridSearchCV from sklearn.feature_extraction.text import TfidfVectorizer

#pipeline parameters parameters = \ [ \ { 'clf': [MultinomialNB()], 'tf-idf__stop_words': ['english', None], 'clf__alpha': [0.001, 0.1, 1, 10, 100] }, { 'clf': [SVC()], 'tf-idf__stop_words': ['english', None], 'clf__C': [0.001, 0.1, 1, 10, 100, 10e5], 'clf__kernel': ['linear', 'rbf'], 'clf__class_weight': ['balanced'], 'clf__probability': [True] }, { 'clf': [DecisionTreeClassifier()], 'tf-idf__stop_words': ['english', None], 'clf__criterion': ['gini','entropy'], 'clf__splitter': ['best','random'], 'clf__class_weight':['balanced', None] } ] #evaluating multiple classifiers #based on pipeline parameters #------------------------------- result=[] for params in parameters: #classifier clf = params['clf'][0] #getting arguments by #popping out classifier params.pop('clf') #pipeline steps = [('tf-idf', TfidfVectorizer()), ('clf',clf)] #cross validation using #Grid Search grid = GridSearchCV(Pipeline(steps), param_grid=params, cv=3) grid.fit(features, labels) #storing result result.append\ ( { 'grid': grid, 'classifier': grid.best_estimator_, 'best score': grid.best_score_, 'best params': grid.best_params_, 'cv': grid.cv } ) #sorting result by best score result = sorted(result, key=itemgetter('best score'),reverse=True) #saving best classifier grid = result[0]['grid'] joblib.dump(grid, 'classifier.pickle')

0 讨论(0)

发布评论:

提交评论

加载中...

借酒劲吻你

2020-12-09 12:52

Here is an easy way to optimize over any classifier and for each classifier any settings of parameters.

Create a switcher class that works for any estimator

from sklearn.base import BaseEstimator class ClfSwitcher(BaseEstimator): def __init__( self, estimator = SGDClassifier(), ): """ A Custom BaseEstimator that can switch between classifiers. :param estimator: sklearn object - The classifier """ self.estimator = estimator def fit(self, X, y=None, **kwargs): self.estimator.fit(X, y) return self def predict(self, X, y=None): return self.estimator.predict(X) def predict_proba(self, X): return self.estimator.predict_proba(X) def score(self, X, y): return self.estimator.score(X, y)

Now you can pass in anything for the estimator parameter. And you can optimize any parameter for any estimator you pass in as follows:

Perform hyper-parameter optimization

from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.naive_bayes import MultinomialNB from sklearn.linear_model import SGDClassifier from sklearn.pipeline import Pipeline from sklearn.model_selection import GridSearchCV pipeline = Pipeline([ ('tfidf', TfidfVectorizer()), ('clf', ClfSwitcher()), ]) parameters = [ { 'clf__estimator': [SGDClassifier()], # SVM if hinge loss / logreg if log loss 'tfidf__max_df': (0.25, 0.5, 0.75, 1.0), 'tfidf__stop_words': ['english', None], 'clf__estimator__penalty': ('l2', 'elasticnet', 'l1'), 'clf__estimator__max_iter': [50, 80], 'clf__estimator__tol': [1e-4], 'clf__estimator__loss': ['hinge', 'log', 'modified_huber'], }, { 'clf__estimator': [MultinomialNB()], 'tfidf__max_df': (0.25, 0.5, 0.75, 1.0), 'tfidf__stop_words': [None], 'clf__estimator__alpha': (1e-2, 1e-3, 1e-1), }, ] gscv = GridSearchCV(pipeline, parameters, cv=5, n_jobs=12, return_train_score=False, verbose=3) gscv.fit(train_data, train_labels)

How to interpret clf__estimator__loss

clf__estimator__loss is interpreted as the loss parameter for whatever estimator is, where estimator = SGDClassifier() in the top most example and is itself a parameter of clf which is a ClfSwitcher object.

0 讨论(0)

发布评论:

提交评论

加载中...

予麋鹿

2020-12-09 12:57

Yes, you can do that by building a wrapper function. The idea is to pass it two dictionaries: the models and the the parameters;

Then you iteratively call the models with all the parameters to test, using GridSearchCV for this.

Check this example, there is added extra functionality so that at the end you output a data frame with the summary of the different models/parameters and different performance scores.

EDIT: It's too much code to paste here, you can check a full working example here:

http://www.davidsbatista.net/blog/2018/02/23/model_optimization/

0 讨论(0)

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复

Pipeline: Multiple classifiers?

Create a switcher class that works for any estimator

Perform hyper-parameter optimization

How to interpret clf__estimator__loss

How to interpret `clfestimatorloss`