grid-search | 易学教程

Skip forbidden parameter combinations when using GridSearchCV

阅读更多关于 Skip forbidden parameter combinations when using GridSearchCV

问题 I want to greedily search the entire parameter space of my support vector classifier using GridSearchCV. However, some combinations of parameters are forbidden by LinearSVC and throw an exception. In particular, there are mutually exclusive combinations of the dual , penalty , and loss parameters: For example, this code: from sklearn import svm, datasets from sklearn.model_selection import GridSearchCV iris = datasets.load_iris() parameters = {'dual':[True, False], 'penalty' : ['l1', 'l2'], \

Python: Gridsearch Without Machine Learning?

阅读更多关于 Python: Gridsearch Without Machine Learning?

问题 I want to optimize an algorithm that has several variable parameters as input. For machine learning tasks, Sklearn offers the optimization of hyperparameters with the gridsearch functionality. Is there a standardized way / library in Python that allows the optimization of hyperparameters that is not limited to machine learning topics? 回答1: You can create a custom pipeline/estimator ( see link http://scikit-learn.org/dev/developers/contributing.html#rolling-your-own-estimator) with a score

Hyperparameter tuning in Keras (MLP) via RandomizedSearchCV

阅读更多关于 Hyperparameter tuning in Keras (MLP) via RandomizedSearchCV

问题 I have been trying to tune a neural net for some time now but unfortunately, I cannot get a good performance out of it. I have a time-series dataset and I am using RandomizedSearchCV for binary classification. My code is below. Any suggestions or help will be appreciated. One thing is that I am still trying to figure out how to incorporate is early stopping. EDIT: Forgot to add that I am measuring the performance based on F1-macro metric and I cannot get a scoring higher that 0.68. Another

Use sklearn's GridSearchCV with a pipeline, preprocessing just once

阅读更多关于 Use sklearn's GridSearchCV with a pipeline, preprocessing just once

问题 I'm using scickit-learn to tune a model hyper-parameters. I'm using a pipeline to have chain the preprocessing with the estimator. A simple version of my problem would look like this: import numpy as np from sklearn.model_selection import GridSearchCV from sklearn.pipeline import make_pipeline from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression grid = GridSearchCV(make_pipeline(StandardScaler(), LogisticRegression()), param_grid={

Use sklearn's GridSearchCV with a pipeline, preprocessing just once

阅读更多关于 Use sklearn's GridSearchCV with a pipeline, preprocessing just once

Scorer function: difference between make_scorer/score_func and

阅读更多关于 Scorer function: difference between make_scorer/score_func and

问题 In scikit's (0.18.1) documentation I find what follows a bit confusing. Seems that writing your own scoring function is doable in multiple ways. But what's the difference? GridSearchCV takes a scoring argument as a: scorer callable object / function with signature scorer(estimator, X, y) This option is supported also in model evaluation docs. Conversely, make_scorer wants a score_func as a: score function (or loss function) with signature score_func(y, y_pred, **kwargs) Example Both

What is _passthrough_scorer and How Can I Change Scorers in GridsearchCV (sklearn)?

阅读更多关于 What is _passthrough_scorer and How Can I Change Scorers in GridsearchCV (sklearn)?

问题 http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html (for reference) x = [[2], [1], [3], [1] ... ] # about 1000 data grid = GridSearchCV(KernelDensity(), {'bandwidth': np.linspace(0.1, 1.0, 10)}, cv=10) grid.fit(x) When I use GridSearchCV without specifying scoring function like the , the value of grid.scorer_ is . Could you explain what kind of function _passthrough_scorer is? In addition to this, I want to change the scoring function to mean_squared_error

Grid Search and Early Stopping Using Cross Validation with XGBoost in SciKit-Learn

阅读更多关于 Grid Search and Early Stopping Using Cross Validation with XGBoost in SciKit-Learn

问题 I am fairly new to sci-kit learn and have been trying to hyper-paramater tune XGBoost. My aim is to use early stopping and grid search to tune the model parameters and use early stopping to control the number of trees and avoid overfitting. As I am using cross validation for the grid search, I was hoping to also use cross-validation in the early stopping criteria. The code I have so far looks like this: import numpy as np import pandas as pd from sklearn import model_selection import xgboost

Sklearn How to Save a Model Created From a Pipeline and GridSearchCV Using Joblib or Pickle?

阅读更多关于 Sklearn How to Save a Model Created From a Pipeline and GridSearchCV Using Joblib or Pickle?

问题 After identifying the best parameters using a pipeline and GridSearchCV , how do I pickle / joblib this process to re-use later? I see how to do this when it's a single classifier... from sklearn.externals import joblib joblib.dump(clf, 'filename.pkl') But how do I save this overall pipeline with the best parameters after performing and completing a gridsearch ? I tried: joblib.dump(grid, 'output.pkl') - But that dumped every gridsearch attempt (many files) joblib.dump(pipeline, 'output.pkl')

Using Smote with Gridsearchcv in Scikit-learn

阅读更多关于 Using Smote with Gridsearchcv in Scikit-learn

问题 I'm dealing with an imbalanced dataset and want to do a grid search to tune my model's parameters using scikit's gridsearchcv. To oversample the data, I want to use SMOTE, and I know I can include that as a stage of a pipeline and pass it to gridsearchcv. My concern is that I think smote will be applied to both train and validation folds, which is not what you are supposed to do. The validation set should not be oversampled. Am I right that the whole pipeline will be applied to both dataset