grid-search

scikit-lean GridSearchCV n_jobs != 1 freezing

自作多情 提交于 2019-11-28 12:38:40
问题 I'm running grid search on random forests and trying to use n_jobs different than one but the kernel freezes, there is no CPU usage. With n_jobs=1 it works fine. I can't even stop the command with ctl-C and have to restart the kernel. I'm running on windows 7. I saw that there is a similar problem with OS X but the solution is not relevant for windows 7. from sklearn.ensemble import RandomForestClassifier rf_tfdidf = Pipeline([('vect',tfidf), ('clf', RandomForestClassifier(n_estimators=50,

Pipeline: Multiple classifiers?

爷,独闯天下 提交于 2019-11-28 01:34:33
问题 I read following example on Pipelines and GridSearchCV in Python: http://www.davidsbatista.net/blog/2017/04/01/document_classification/ Logistic Regression: pipeline = Pipeline([ ('tfidf', TfidfVectorizer(stop_words=stop_words)), ('clf', OneVsRestClassifier(LogisticRegression(solver='sag')), ]) parameters = { 'tfidf__max_df': (0.25, 0.5, 0.75), 'tfidf__ngram_range': [(1, 1), (1, 2), (1, 3)], "clf__estimator__C": [0.01, 0.1, 1], "clf__estimator__class_weight": ['balanced', None], } SVM:

How to perform feature selection with gridsearchcv in sklearn in python

喜夏-厌秋 提交于 2019-11-28 01:16:20
I am using recursive feature elimination with cross validation (rfecv) as a feature selector for randomforest classifier as follows. X = df[[my_features]] #all my features y = df['gold_standard'] #labels clf = RandomForestClassifier(random_state = 42, class_weight="balanced") rfecv = RFECV(estimator=clf, step=1, cv=StratifiedKFold(10), scoring='roc_auc') rfecv.fit(X,y) print("Optimal number of features : %d" % rfecv.n_features_) features=list(X.columns[rfecv.support_]) I am also performing GridSearchCV as follows to tune the hyperparameters of RandomForestClassifier as follows. X = df[[my

Model help using Scikit-learn when using GridSearch

纵饮孤独 提交于 2019-11-27 22:38:06
As part of the Enron project, built the attached model, Below is the summary of the steps, Below model gives highly perfect scores cv = StratifiedShuffleSplit(n_splits = 100, test_size = 0.2, random_state = 42) gcv = GridSearchCV(pipe, clf_params,cv=cv) gcv.fit(features,labels) ---> with the full dataset for train_ind, test_ind in cv.split(features,labels): x_train, x_test = features[train_ind], features[test_ind] y_train, y_test = labels[train_ind],labels[test_ind] gcv.best_estimator_.predict(x_test) Below model gives more reasonable but low scores cv = StratifiedShuffleSplit(n_splits = 100,

how to tune parameters of custom kernel function with pipeline in scikit-learn

一个人想着一个人 提交于 2019-11-27 17:45:30
问题 currently I have successfully defined a custom kernel function(pre-computing the kernel matrix) using def function, and now I am using the GridSearchCV function to get the best parameters. so, in the custom kernel function, there is a total of 2 parameters which will be tuned (Namely gamm and sea_gamma in the example below), and also, for SVR model, the cost c parameter has to be tuned as well. But until now, I can just tune the cost c parameter using GridSearchCV -> please refer to the Part

Is there easy way to grid search without cross validation in python?

≯℡__Kan透↙ 提交于 2019-11-27 13:53:29
问题 There is absolutely helpful class GridSearchCV in scikit-learn to do grid search and cross validation, but I don't want to do cross validataion. I want to do grid search without cross validation and use whole data to train. To be more specific, I need to evaluate my model made by RandomForestClassifier with "oob score" during grid search. Is there easy way to do it? or should I make a class by myself? The points are I'd like to do grid search with easy way. I don't want to do cross validation

How can I avoid using estimator_params when using RFECV nested within GridSearchCV?

本秂侑毒 提交于 2019-11-27 06:26:52
问题 I'm currently working on recursive feature elimination (RFECV) within a grid search (GridSearchCV) for tree based methods using scikit-learn. To do this, I'm using the current dev version on GitHub (0.17) which allows RFECV to use feature importance from the tree methods to select features to discard. For clarity this means: loop over hyperparameters for the current tree method for each set of parameters perform recursive feature elimination to obtain the optimal number of features report the

Model help using Scikit-learn when using GridSearch

喜欢而已 提交于 2019-11-27 04:28:51
问题 As part of the Enron project, built the attached model, Below is the summary of the steps, Below model gives highly perfect scores cv = StratifiedShuffleSplit(n_splits = 100, test_size = 0.2, random_state = 42) gcv = GridSearchCV(pipe, clf_params,cv=cv) gcv.fit(features,labels) ---> with the full dataset for train_ind, test_ind in cv.split(features,labels): x_train, x_test = features[train_ind], features[test_ind] y_train, y_test = labels[train_ind],labels[test_ind] gcv.best_estimator_

Does GridSearchCV perform cross-validation?

本秂侑毒 提交于 2019-11-26 20:14:22
问题 I'm currently working on a problem which compares three different machine learning algorithms performance on the same data-set. I divided the data-set into 70/30 training/testing sets and then performed grid search for the best parameters of each algorithm using GridSearchCV and X_train, y_train . First question, am I suppose to perform grid search on the training set or is it suppose to be on the whole data-set? Second question, I know that GridSearchCV uses K-fold in its' implementation,

scikit-learn GridSearchCV with multiple repetitions

旧城冷巷雨未停 提交于 2019-11-26 07:34:36
问题 I\'m trying to get the best set of parameters for an SVR model. I\'d like to use the GridSearchCV over different values of C . However, from previous test I noticed that the split into Training/Test set higlhy influence the overall performance (r2 in this instance). To address this problem, I\'d like to implement a repeated 5-fold cross validation (10 x 5CV). Is there a built in way of performing it using GridSearchCV ? QUICK SOLUTION: Following the idea presented in the sci-kit offical