grid-search | 易学教程

How to perform feature selection with gridsearchcv in sklearn in python

阅读更多关于 How to perform feature selection with gridsearchcv in sklearn in python

问题 I am using recursive feature elimination with cross validation (rfecv) as a feature selector for randomforest classifier as follows. X = df[[my_features]] #all my features y = df['gold_standard'] #labels clf = RandomForestClassifier(random_state = 42, class_weight="balanced") rfecv = RFECV(estimator=clf, step=1, cv=StratifiedKFold(10), scoring='roc_auc') rfecv.fit(X,y) print("Optimal number of features : %d" % rfecv.n_features_) features=list(X.columns[rfecv.support_]) I am also performing

How to perform feature selection with gridsearchcv in sklearn in python

阅读更多关于 How to perform feature selection with gridsearchcv in sklearn in python

Grid search on parameters inside the parameters of a BaggingClassifier

阅读更多关于 Grid search on parameters inside the parameters of a BaggingClassifier

问题 This is a follow up on a question answered here, but I believe it deserves its own thread. In the previous question, we were dealing with “an Ensemble of Ensemble classifiers, where each has its own parameters.” Let's start with the example provided by MaximeKan in his answer: my_est = BaggingClassifier(RandomForestClassifier(n_estimators = 100, bootstrap = True, max_features = 0.5), n_estimators = 5, bootstrap_features = False, bootstrap = False, max_features = 1.0, max_samples = 0.6 ) Now

How to specify positive label when use precision as scoring in GridSearchCV

阅读更多关于 How to specify positive label when use precision as scoring in GridSearchCV

问题 model = sklearn.model_selection.GridSearchCV( estimator = est, param_grid = param_grid, scoring = 'precision', verbose = 1, n_jobs = 1, iid = True, cv = 3) In sklearn.metrics.precision_score(y, y_pred,pos_label=[0]) , I can specify the positive label, how can I specify this in GridSearchCV too? If there is no way to specify, when using custom scoring, how can I define? I have tried this: custom_score = make_scorer(precision_score(y, y_pred,pos_label=[0]), greater_is_better=True) but I got

If I interrupt sklearn grid_search.fit() before completion can I access the current .best_score_, .best_params_?

阅读更多关于 If I interrupt sklearn grid_search.fit() before completion can I access the current .best_score_, .best_params_?

问题 If I interrupt grid_search.fit() before completion will I loose everything it's done so far? I got a little carried away with my grid search and provided an obscenely large search space. I can see scores that I'm happy with already but my stdout doesn't display which params led to those scores.. I've searched the docs: http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html And there is a discussion from a couple years ago about adding a feature for parrallel

Grid search error

阅读更多关于 Grid search error

问题 I've been trying to perform a grid search, but something seems to be off. My code is: grid_search_0 = GridSearchCV(estimator=Pipeline([('vectorizer', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf', LinearSVC())]), param_grid={'C': 3**np.arange(-3, 3, dtype='float'), 'gamma': 3**np.arange(-6, 0, dtype='float'), }, cv=10, scoring=make_scorer(roc_auc_score, needs_threshold=True), verbose=1, n_jobs=-1,) and I get the error ImportError: [joblib] Attempting to do parallel computing

specify scoring metric in GridSearch function with hypopt package in python

阅读更多关于 specify scoring metric in GridSearch function with hypopt package in python

问题 I'm using Gridsearch function from hypopt package to do my hyperparameter searching using specified validation set. The default metric for classification seems to be accuracy (not very sure). Here I want to use F1 score as the metric. I do not know where I should specify the metric. I looked at the documentation but kind of confused. Does anyone who are familiar with hypopt package know how I can do this? Thanks a lot in advance. from hypopt import GridSearch log_reg_params = {"penalty": ['l1

specify scoring metric in GridSearch function with hypopt package in python

阅读更多关于 specify scoring metric in GridSearch function with hypopt package in python

ValueError: Can't handle mix of multilabel-indicator and binary

阅读更多关于 ValueError: Can't handle mix of multilabel-indicator and binary

问题 I am using Keras with the scikit-learn wrapper. In particular, I want to use GridSearchCV for hyper-parameters optimisation. This is a multi-class problem, i.e. the target variable can have only one label chosen on a set of n classes. For instance, the target variable can be 'Class1', 'Class2' ... 'Classn'. # self._arch creates my model nn = KerasClassifier(build_fn=self._arch, verbose=0) clf = GridSearchCV( nn, param_grid={ ... }, # I use f1 score macro averaged scoring='f1_macro', n_jobs=-1

Python, machine learning - Perform a grid search on custom validation set

阅读更多关于 Python, machine learning - Perform a grid search on custom validation set

问题 I am dealing with an unbalanced classification problem, where my negative class is 1000 times more numerous than my positive class. My strategy is to train a deep neural network on a balanced (50/50 ratio) training set (I have enough simulated samples), and then use an unbalanced (1/1000 ratio) validation set to select the best model and optimise the hyperparameters. Since the number of parameters is significant, I want to use scikit-learn RandomizedSearchCV, i.e. a random grid search. To my