grid-search

How to perform feature selection with gridsearchcv in sklearn in python

这一生的挚爱 提交于 2019-12-28 06:23:24
问题 I am using recursive feature elimination with cross validation (rfecv) as a feature selector for randomforest classifier as follows. X = df[[my_features]] #all my features y = df['gold_standard'] #labels clf = RandomForestClassifier(random_state = 42, class_weight="balanced") rfecv = RFECV(estimator=clf, step=1, cv=StratifiedKFold(10), scoring='roc_auc') rfecv.fit(X,y) print("Optimal number of features : %d" % rfecv.n_features_) features=list(X.columns[rfecv.support_]) I am also performing

How to perform feature selection with gridsearchcv in sklearn in python

一曲冷凌霜 提交于 2019-12-28 06:23:11
问题 I am using recursive feature elimination with cross validation (rfecv) as a feature selector for randomforest classifier as follows. X = df[[my_features]] #all my features y = df['gold_standard'] #labels clf = RandomForestClassifier(random_state = 42, class_weight="balanced") rfecv = RFECV(estimator=clf, step=1, cv=StratifiedKFold(10), scoring='roc_auc') rfecv.fit(X,y) print("Optimal number of features : %d" % rfecv.n_features_) features=list(X.columns[rfecv.support_]) I am also performing

Grid search on parameters inside the parameters of a BaggingClassifier

别说谁变了你拦得住时间么 提交于 2019-12-24 20:32:13
问题 This is a follow up on a question answered here, but I believe it deserves its own thread. In the previous question, we were dealing with “an Ensemble of Ensemble classifiers, where each has its own parameters.” Let's start with the example provided by MaximeKan in his answer: my_est = BaggingClassifier(RandomForestClassifier(n_estimators = 100, bootstrap = True, max_features = 0.5), n_estimators = 5, bootstrap_features = False, bootstrap = False, max_features = 1.0, max_samples = 0.6 ) Now

How to specify positive label when use precision as scoring in GridSearchCV

Deadly 提交于 2019-12-24 14:11:44
问题 model = sklearn.model_selection.GridSearchCV( estimator = est, param_grid = param_grid, scoring = 'precision', verbose = 1, n_jobs = 1, iid = True, cv = 3) In sklearn.metrics.precision_score(y, y_pred,pos_label=[0]) , I can specify the positive label, how can I specify this in GridSearchCV too? If there is no way to specify, when using custom scoring, how can I define? I have tried this: custom_score = make_scorer(precision_score(y, y_pred,pos_label=[0]), greater_is_better=True) but I got

If I interrupt sklearn grid_search.fit() before completion can I access the current .best_score_, .best_params_?

回眸只為那壹抹淺笑 提交于 2019-12-24 10:26:51
问题 If I interrupt grid_search.fit() before completion will I loose everything it's done so far? I got a little carried away with my grid search and provided an obscenely large search space. I can see scores that I'm happy with already but my stdout doesn't display which params led to those scores.. I've searched the docs: http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html And there is a discussion from a couple years ago about adding a feature for parrallel

Grid search error

时间秒杀一切 提交于 2019-12-24 07:02:29
问题 I've been trying to perform a grid search, but something seems to be off. My code is: grid_search_0 = GridSearchCV(estimator=Pipeline([('vectorizer', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf', LinearSVC())]), param_grid={'C': 3**np.arange(-3, 3, dtype='float'), 'gamma': 3**np.arange(-6, 0, dtype='float'), }, cv=10, scoring=make_scorer(roc_auc_score, needs_threshold=True), verbose=1, n_jobs=-1,) and I get the error ImportError: [joblib] Attempting to do parallel computing

specify scoring metric in GridSearch function with hypopt package in python

岁酱吖の 提交于 2019-12-22 19:32:02
问题 I'm using Gridsearch function from hypopt package to do my hyperparameter searching using specified validation set. The default metric for classification seems to be accuracy (not very sure). Here I want to use F1 score as the metric. I do not know where I should specify the metric. I looked at the documentation but kind of confused. Does anyone who are familiar with hypopt package know how I can do this? Thanks a lot in advance. from hypopt import GridSearch log_reg_params = {"penalty": ['l1

specify scoring metric in GridSearch function with hypopt package in python

非 Y 不嫁゛ 提交于 2019-12-22 19:31:32
问题 I'm using Gridsearch function from hypopt package to do my hyperparameter searching using specified validation set. The default metric for classification seems to be accuracy (not very sure). Here I want to use F1 score as the metric. I do not know where I should specify the metric. I looked at the documentation but kind of confused. Does anyone who are familiar with hypopt package know how I can do this? Thanks a lot in advance. from hypopt import GridSearch log_reg_params = {"penalty": ['l1

ValueError: Can't handle mix of multilabel-indicator and binary

 ̄綄美尐妖づ 提交于 2019-12-22 09:12:19
问题 I am using Keras with the scikit-learn wrapper. In particular, I want to use GridSearchCV for hyper-parameters optimisation. This is a multi-class problem, i.e. the target variable can have only one label chosen on a set of n classes. For instance, the target variable can be 'Class1', 'Class2' ... 'Classn'. # self._arch creates my model nn = KerasClassifier(build_fn=self._arch, verbose=0) clf = GridSearchCV( nn, param_grid={ ... }, # I use f1 score macro averaged scoring='f1_macro', n_jobs=-1

Python, machine learning - Perform a grid search on custom validation set

 ̄綄美尐妖づ 提交于 2019-12-21 21:27:52
问题 I am dealing with an unbalanced classification problem, where my negative class is 1000 times more numerous than my positive class. My strategy is to train a deep neural network on a balanced (50/50 ratio) training set (I have enough simulated samples), and then use an unbalanced (1/1000 ratio) validation set to select the best model and optimise the hyperparameters. Since the number of parameters is significant, I want to use scikit-learn RandomizedSearchCV, i.e. a random grid search. To my