grid-search | 易学教程

scikit-lean GridSearchCV n_jobs != 1 freezing

阅读更多关于 scikit-lean GridSearchCV n_jobs != 1 freezing

问题 I'm running grid search on random forests and trying to use n_jobs different than one but the kernel freezes, there is no CPU usage. With n_jobs=1 it works fine. I can't even stop the command with ctl-C and have to restart the kernel. I'm running on windows 7. I saw that there is a similar problem with OS X but the solution is not relevant for windows 7. from sklearn.ensemble import RandomForestClassifier rf_tfdidf = Pipeline([('vect',tfidf), ('clf', RandomForestClassifier(n_estimators=50,

Pipeline: Multiple classifiers?

阅读更多关于 Pipeline: Multiple classifiers?

问题 I read following example on Pipelines and GridSearchCV in Python: http://www.davidsbatista.net/blog/2017/04/01/document_classification/ Logistic Regression: pipeline = Pipeline([ ('tfidf', TfidfVectorizer(stop_words=stop_words)), ('clf', OneVsRestClassifier(LogisticRegression(solver='sag')), ]) parameters = { 'tfidf__max_df': (0.25, 0.5, 0.75), 'tfidf__ngram_range': [(1, 1), (1, 2), (1, 3)], "clf__estimator__C": [0.01, 0.1, 1], "clf__estimator__class_weight": ['balanced', None], } SVM:

How to perform feature selection with gridsearchcv in sklearn in python

阅读更多关于 How to perform feature selection with gridsearchcv in sklearn in python

I am using recursive feature elimination with cross validation (rfecv) as a feature selector for randomforest classifier as follows. X = df[[my_features]] #all my features y = df['gold_standard'] #labels clf = RandomForestClassifier(random_state = 42, class_weight="balanced") rfecv = RFECV(estimator=clf, step=1, cv=StratifiedKFold(10), scoring='roc_auc') rfecv.fit(X,y) print("Optimal number of features : %d" % rfecv.n_features_) features=list(X.columns[rfecv.support_]) I am also performing GridSearchCV as follows to tune the hyperparameters of RandomForestClassifier as follows. X = df[[my

Model help using Scikit-learn when using GridSearch

阅读更多关于 Model help using Scikit-learn when using GridSearch

As part of the Enron project, built the attached model, Below is the summary of the steps, Below model gives highly perfect scores cv = StratifiedShuffleSplit(n_splits = 100, test_size = 0.2, random_state = 42) gcv = GridSearchCV(pipe, clf_params,cv=cv) gcv.fit(features,labels) ---> with the full dataset for train_ind, test_ind in cv.split(features,labels): x_train, x_test = features[train_ind], features[test_ind] y_train, y_test = labels[train_ind],labels[test_ind] gcv.best_estimator_.predict(x_test) Below model gives more reasonable but low scores cv = StratifiedShuffleSplit(n_splits = 100,

how to tune parameters of custom kernel function with pipeline in scikit-learn

阅读更多关于 how to tune parameters of custom kernel function with pipeline in scikit-learn

问题 currently I have successfully defined a custom kernel function(pre-computing the kernel matrix) using def function, and now I am using the GridSearchCV function to get the best parameters. so, in the custom kernel function, there is a total of 2 parameters which will be tuned (Namely gamm and sea_gamma in the example below), and also, for SVR model, the cost c parameter has to be tuned as well. But until now, I can just tune the cost c parameter using GridSearchCV -> please refer to the Part

Is there easy way to grid search without cross validation in python?

阅读更多关于 Is there easy way to grid search without cross validation in python?

问题 There is absolutely helpful class GridSearchCV in scikit-learn to do grid search and cross validation, but I don't want to do cross validataion. I want to do grid search without cross validation and use whole data to train. To be more specific, I need to evaluate my model made by RandomForestClassifier with "oob score" during grid search. Is there easy way to do it? or should I make a class by myself? The points are I'd like to do grid search with easy way. I don't want to do cross validation

How can I avoid using estimator_params when using RFECV nested within GridSearchCV?

阅读更多关于 How can I avoid using estimator_params when using RFECV nested within GridSearchCV?

问题 I'm currently working on recursive feature elimination (RFECV) within a grid search (GridSearchCV) for tree based methods using scikit-learn. To do this, I'm using the current dev version on GitHub (0.17) which allows RFECV to use feature importance from the tree methods to select features to discard. For clarity this means: loop over hyperparameters for the current tree method for each set of parameters perform recursive feature elimination to obtain the optimal number of features report the

Model help using Scikit-learn when using GridSearch

阅读更多关于 Model help using Scikit-learn when using GridSearch

问题 As part of the Enron project, built the attached model, Below is the summary of the steps, Below model gives highly perfect scores cv = StratifiedShuffleSplit(n_splits = 100, test_size = 0.2, random_state = 42) gcv = GridSearchCV(pipe, clf_params,cv=cv) gcv.fit(features,labels) ---> with the full dataset for train_ind, test_ind in cv.split(features,labels): x_train, x_test = features[train_ind], features[test_ind] y_train, y_test = labels[train_ind],labels[test_ind] gcv.best_estimator_

Does GridSearchCV perform cross-validation?

阅读更多关于 Does GridSearchCV perform cross-validation?

问题 I'm currently working on a problem which compares three different machine learning algorithms performance on the same data-set. I divided the data-set into 70/30 training/testing sets and then performed grid search for the best parameters of each algorithm using GridSearchCV and X_train, y_train . First question, am I suppose to perform grid search on the training set or is it suppose to be on the whole data-set? Second question, I know that GridSearchCV uses K-fold in its' implementation,

scikit-learn GridSearchCV with multiple repetitions

阅读更多关于 scikit-learn GridSearchCV with multiple repetitions

问题 I\'m trying to get the best set of parameters for an SVR model. I\'d like to use the GridSearchCV over different values of C . However, from previous test I noticed that the split into Training/Test set higlhy influence the overall performance (r2 in this instance). To address this problem, I\'d like to implement a repeated 5-fold cross validation (10 x 5CV). Is there a built in way of performing it using GridSearchCV ? QUICK SOLUTION: Following the idea presented in the sci-kit offical