grid-search

More than one estimator in GridSearchCV(sklearn)

寵の児 提交于 2019-12-21 14:58:29
问题 I was checking sklearn documentation webpage about GridSearchCV . One of attributes of GridSearchCV object is best_estimator_ . So here is my question. How to pass more than one estimator to GSCV object? Using a dictionary like: {'SVC()':{'C':10, 'gamma':0.01}, ' DecTreeClass()':{....}} ? 回答1: GridSearchCV works on parameters. It will train multiple estimators (but same class (one of SVC, or DecisionTreeClassifier, or other classifiers) with different parameter combinations from specified in

Get standard deviation for a GridSearchCV

我是研究僧i 提交于 2019-12-14 03:49:13
问题 Before scikit-learn 0.20 we could use result.grid_scores_[result.best_index_] to get the standard deviation. (It returned for exemple: mean: 0.76172, std: 0.05225, params: {'n_neighbors': 21} ) What's the best way in scikit-learn 0.20 to get the standard deviation of the best score ? 回答1: In newer versions, the grid_scores_ is renamed as cv_results_ . Following the documentation, you need this: best_index_ : int The index (of the cv_results_ arrays) which corresponds to the best > candidate

How to use `log_loss` in `GridSearchCV` with multi-class labels in Scikit-Learn (sklearn)?

故事扮演 提交于 2019-12-13 12:28:07
问题 I'm trying to use the log_loss argument in the scoring parameter of GridSearchCV to tune this multi-class (6 classes) classifier. I don't understand how to give it a label parameter. Even if I gave it sklearn.metrics.log_loss , it would change for each iteration in the cross-validation so I don't understand how to give it the labels parameter? I'm using Python v3.6 and Scikit-Learn v0.18.1 How can I use GridSearchCV with log_loss with multi-class model tuning? My class representation: 1 31 2

Explicitly specifying test/train sets in GridSearchCV

非 Y 不嫁゛ 提交于 2019-12-13 11:50:32
问题 I have a question about the cv parameter of sklearn's GridSearchCV. I'm working with data that has a time component to it, so I don't think random shuffling within KFold cross-validation seems sensible. Instead, I want to explicitly specify cutoffs for training, validation, and test data within a GridSearchCV . Can I do this? To better illuminate the question, here's how I would to that manually. import numpy as np import pandas as pd from sklearn.linear_model import Ridge np.random.seed(444)

How to give GridSearchCV a list of indicies for cross-validation?

蹲街弑〆低调 提交于 2019-12-13 02:49:22
问题 I'm trying to use custom cross-validation sets for a very specific dataset and scikit-optimize using BayesSearchCV . I've been able to replicate the error with scikit-learn using GridSearchCV . Straight from the documentation: cv : int, cross-validation generator or an iterable, optional Determines the cross-validation splitting strategy. Possible inputs for cv are: None, to use the default 3-fold cross validation, integer, to specify the number of folds in a (Stratified)KFold, An object to

TypeError: 'ShuffleSplit' object is not iterable

帅比萌擦擦* 提交于 2019-12-12 19:05:14
问题 I am using ShuffleSplit to shuffle data, but I found there is an error TypeError Traceback (most recent call last) <ipython-input-36-192f7c286a58> in <module>() 1 # Fit the training data to the model using grid search ----> 2 reg = fit_model(X_train, y_train) 3 4 # Produce the value for 'max_depth' 5 print "Parameter 'max_depth' is {} for the optimal model.".format(reg.get_params()['max_depth']) <ipython-input-34-18b2799e585c> in fit_model(X, y) 32 33 # Fit the grid search object to the data

NDCG as scoring function with GridSearchCV and stratified data?

笑着哭i 提交于 2019-12-12 16:49:50
问题 I'm working on a learning to rank task, dataset has a column thread_id which is a group label (stratified data). In the evaluation phase I must take into account these groups as my scoring function works on a per-thread fashion (e.g. nDCG). Now, if I implement nDCG with a signature scorer(estimator, X, y) I can easily pass it to GridSearchCV as scoring function as in the example below: def my_nDCG(estimator, X, y): # group by X['thread_id'] # compute the result return result splitter =

Fitting sklearn GridSearchCV model

南楼画角 提交于 2019-12-12 12:07:18
问题 I am trying to solve a regression problem on Boston Dataset with help of random forest regressor.I was using GridSearchCV for selection of best hyperparameters. Problem 1 Should I fit the GridSearchCV on some X_train, y_train and then get the best parameters. OR Should I fit it on X, y to get best parameters.(X, y = entire dataset) Problem 2 Say If I fit it on X, y and get the best parameters and then build a new model on these best parameters. Now how should I train this new model on ?

Reshape pandas.Df to use in GridSearch

℡╲_俬逩灬. 提交于 2019-12-12 05:28:58
问题 I am trying to use multiple feature columns in GridSearch with Pipeline. So I pass two columns for which I want to do a TfidfVectorizer, but I get into trouble when running the GridSearch. Xs = training_data.loc[:,['text','path_contents']] y = training_data['class_recoded'].astype('int32') for col in Xs: print Xs[col].shape print Xs.shape print y.shape # (2464L,) # (2464L,) # (2464, 2) # (2464L,) from sklearn.pipeline import Pipeline from sklearn.feature_extraction.text import TfidfVectorizer

How to get predictions for each set of parameters using GridSearchCV?

拜拜、爱过 提交于 2019-12-12 04:39:55
问题 I'm trying to find the best parameters for NN regression model using GridSearchCV with following code: param_grid = dict(optimizer=optimizer, epochs=epochs, batch_size=batches, init=init grid = GridSearchCV(estimator=model, param_grid=param_grid, scoring='neg_mean_squared_error') grid_result = grid.fit(input_train, target_train) pred = grid.predict(input_test) As I understand, grid.predict(input_test) uses best parameters to predict the given input set. Is there any way to evaluate