grid-search | 易学教程

More than one estimator in GridSearchCV(sklearn)

阅读更多关于 More than one estimator in GridSearchCV(sklearn)

问题 I was checking sklearn documentation webpage about GridSearchCV . One of attributes of GridSearchCV object is best_estimator_ . So here is my question. How to pass more than one estimator to GSCV object? Using a dictionary like: {'SVC()':{'C':10, 'gamma':0.01}, ' DecTreeClass()':{....}} ? 回答1: GridSearchCV works on parameters. It will train multiple estimators (but same class (one of SVC, or DecisionTreeClassifier, or other classifiers) with different parameter combinations from specified in

Get standard deviation for a GridSearchCV

阅读更多关于 Get standard deviation for a GridSearchCV

问题 Before scikit-learn 0.20 we could use result.grid_scores_[result.best_index_] to get the standard deviation. (It returned for exemple: mean: 0.76172, std: 0.05225, params: {'n_neighbors': 21} ) What's the best way in scikit-learn 0.20 to get the standard deviation of the best score ? 回答1: In newer versions, the grid_scores_ is renamed as cv_results_ . Following the documentation, you need this: best_index_ : int The index (of the cv_results_ arrays) which corresponds to the best > candidate

How to use `log_loss` in `GridSearchCV` with multi-class labels in Scikit-Learn (sklearn)?

阅读更多关于 How to use `log_loss` in `GridSearchCV` with multi-class labels in Scikit-Learn (sklearn)?

问题 I'm trying to use the log_loss argument in the scoring parameter of GridSearchCV to tune this multi-class (6 classes) classifier. I don't understand how to give it a label parameter. Even if I gave it sklearn.metrics.log_loss , it would change for each iteration in the cross-validation so I don't understand how to give it the labels parameter? I'm using Python v3.6 and Scikit-Learn v0.18.1 How can I use GridSearchCV with log_loss with multi-class model tuning? My class representation: 1 31 2

Explicitly specifying test/train sets in GridSearchCV

阅读更多关于 Explicitly specifying test/train sets in GridSearchCV

问题 I have a question about the cv parameter of sklearn's GridSearchCV. I'm working with data that has a time component to it, so I don't think random shuffling within KFold cross-validation seems sensible. Instead, I want to explicitly specify cutoffs for training, validation, and test data within a GridSearchCV . Can I do this? To better illuminate the question, here's how I would to that manually. import numpy as np import pandas as pd from sklearn.linear_model import Ridge np.random.seed(444)

How to give GridSearchCV a list of indicies for cross-validation?

阅读更多关于 How to give GridSearchCV a list of indicies for cross-validation?

问题 I'm trying to use custom cross-validation sets for a very specific dataset and scikit-optimize using BayesSearchCV . I've been able to replicate the error with scikit-learn using GridSearchCV . Straight from the documentation: cv : int, cross-validation generator or an iterable, optional Determines the cross-validation splitting strategy. Possible inputs for cv are: None, to use the default 3-fold cross validation, integer, to specify the number of folds in a (Stratified)KFold, An object to

TypeError: 'ShuffleSplit' object is not iterable

阅读更多关于 TypeError: 'ShuffleSplit' object is not iterable

问题 I am using ShuffleSplit to shuffle data, but I found there is an error TypeError Traceback (most recent call last) <ipython-input-36-192f7c286a58> in <module>() 1 # Fit the training data to the model using grid search ----> 2 reg = fit_model(X_train, y_train) 3 4 # Produce the value for 'max_depth' 5 print "Parameter 'max_depth' is {} for the optimal model.".format(reg.get_params()['max_depth']) <ipython-input-34-18b2799e585c> in fit_model(X, y) 32 33 # Fit the grid search object to the data

NDCG as scoring function with GridSearchCV and stratified data?

阅读更多关于 NDCG as scoring function with GridSearchCV and stratified data?

问题 I'm working on a learning to rank task, dataset has a column thread_id which is a group label (stratified data). In the evaluation phase I must take into account these groups as my scoring function works on a per-thread fashion (e.g. nDCG). Now, if I implement nDCG with a signature scorer(estimator, X, y) I can easily pass it to GridSearchCV as scoring function as in the example below: def my_nDCG(estimator, X, y): # group by X['thread_id'] # compute the result return result splitter =

Fitting sklearn GridSearchCV model

阅读更多关于 Fitting sklearn GridSearchCV model

问题 I am trying to solve a regression problem on Boston Dataset with help of random forest regressor.I was using GridSearchCV for selection of best hyperparameters. Problem 1 Should I fit the GridSearchCV on some X_train, y_train and then get the best parameters. OR Should I fit it on X, y to get best parameters.(X, y = entire dataset) Problem 2 Say If I fit it on X, y and get the best parameters and then build a new model on these best parameters. Now how should I train this new model on ?

Reshape pandas.Df to use in GridSearch

阅读更多关于 Reshape pandas.Df to use in GridSearch

问题 I am trying to use multiple feature columns in GridSearch with Pipeline. So I pass two columns for which I want to do a TfidfVectorizer, but I get into trouble when running the GridSearch. Xs = training_data.loc[:,['text','path_contents']] y = training_data['class_recoded'].astype('int32') for col in Xs: print Xs[col].shape print Xs.shape print y.shape # (2464L,) # (2464L,) # (2464, 2) # (2464L,) from sklearn.pipeline import Pipeline from sklearn.feature_extraction.text import TfidfVectorizer

How to get predictions for each set of parameters using GridSearchCV?

阅读更多关于 How to get predictions for each set of parameters using GridSearchCV?

问题 I'm trying to find the best parameters for NN regression model using GridSearchCV with following code: param_grid = dict(optimizer=optimizer, epochs=epochs, batch_size=batches, init=init grid = GridSearchCV(estimator=model, param_grid=param_grid, scoring='neg_mean_squared_error') grid_result = grid.fit(input_train, target_train) pred = grid.predict(input_test) As I understand, grid.predict(input_test) uses best parameters to predict the given input set. Is there any way to evaluate