grid-search | 易学教程

How to use `log_loss` in `GridSearchCV` with multi-class labels in Scikit-Learn (sklearn)?

阅读更多关于 How to use `log_loss` in `GridSearchCV` with multi-class labels in Scikit-Learn (sklearn)?

I'm trying to use the log_loss argument in the scoring parameter of GridSearchCV to tune this multi-class (6 classes) classifier. I don't understand how to give it a label parameter. Even if I gave it sklearn.metrics.log_loss , it would change for each iteration in the cross-validation so I don't understand how to give it the labels parameter? I'm using Python v3.6 and Scikit-Learn v0.18.1 How can I use GridSearchCV with log_loss with multi-class model tuning? My class representation: 1 31 2 18 3 28 4 19 5 17 6 22 Name: encoding, dtype: int64 My code: param_test = {"criterion": ["friedman_mse"

Pipeline and GridSearch for Doc2Vec

阅读更多关于 Pipeline and GridSearch for Doc2Vec

I currently have following script that helps to find the best model for a doc2vec model. It works like this: First train a few models based on given parameters and then test against a classifier. Finally, it outputs the best model and classifier (I hope). Data Example data (data.csv) can be downloaded here: https://pastebin.com/takYp6T8 Note that the data has a structure that should make an ideal classifier with 1.0 accuracy. Script import sys import os from time import time from operator import itemgetter import pickle import pandas as pd import numpy as np from argparse import ArgumentParser

More than one estimator in GridSearchCV(sklearn)

阅读更多关于 More than one estimator in GridSearchCV(sklearn)

I was checking sklearn documentation webpage about GridSearchCV . One of attributes of GridSearchCV object is best_estimator_ . So here is my question. How to pass more than one estimator to GSCV object? Using a dictionary like: {'SVC()':{'C':10, 'gamma':0.01}, ' DecTreeClass()':{....}} ? GridSearchCV works on parameters. It will train multiple estimators (but same class (one of SVC, or DecisionTreeClassifier, or other classifiers) with different parameter combinations from specified in param_grid . best_estimator_ is the estimator which performs best on the data. So essentially best_estimator

GridSearchCV/RandomizedSearchCV with LSTM

阅读更多关于 GridSearchCV/RandomizedSearchCV with LSTM

问题 I am stuck on the trying to tune hyperparameters for LSTM via RandomizedSearchCV. My code is below: X_train = X_train.reshape((X_train.shape[0], 1, X_train.shape[1])) X_test = X_test.reshape((X_test.shape[0], 1, X_test.shape[1])) print(X_train.shape, y_train.shape, X_test.shape, y_test.shape) from imblearn.pipeline import Pipeline from keras.initializers import RandomNormal def create_model(activation_1='relu', activation_2='relu', neurons_input = 1, neurons_hidden_1=1, optimizer='Adam' ,

TypeError grid seach

阅读更多关于 TypeError grid seach

问题 I used to create loop for finding the best parameters for my model which increased my errors in coding so I decided to use GridSearchCV . I am trying to find out the best parameters for PCA for my model (the only parameter I want to grid search on). In this model, after normalization I want to combine the original features with the PCA reduced features and then apply the linear SVM. Then I save the whole model to predict my input on. I have an error in the line where I try to fit the data so

How to access Scikit Learn nested cross-validation scores

阅读更多关于 How to access Scikit Learn nested cross-validation scores

问题 I'm using python and I would like to use nested cross-validation with scikit learn. I have found a very good example: NUM_TRIALS = 30 non_nested_scores = np.zeros(NUM_TRIALS) nested_scores = np.zeros(NUM_TRIALS) # Choose cross-validation techniques for the inner and outer loops, # independently of the dataset. # E.g "LabelKFold", "LeaveOneOut", "LeaveOneLabelOut", etc. inner_cv = KFold(n_splits=4, shuffle=True, random_state=i) outer_cv = KFold(n_splits=4, shuffle=True, random_state=i) # Non

How to pass elegantly Sklearn's GridseachCV's best parameters to another model?

阅读更多关于 How to pass elegantly Sklearn's GridseachCV's best parameters to another model?

问题 I have found a set of best hyperparameters for my KNN estimator with Grid Search CV: >>> knn_gridsearch_model.best_params_ {'algorithm': 'auto', 'metric': 'manhattan', 'n_neighbors': 3} So far, so good. I want to train my final estimator with these new-found parameters. Is there a way to feed the above hyperparameter dict to it directly? I tried this: >>> new_knn_model = KNeighborsClassifier(knn_gridsearch_model.best_params_) but instead the hoped result new_knn_model just got the whole dict

Explicitly specifying test/train sets in GridSearchCV

阅读更多关于 Explicitly specifying test/train sets in GridSearchCV

I have a question about the cv parameter of sklearn's GridSearchCV . I'm working with data that has a time component to it, so I don't think random shuffling within KFold cross-validation seems sensible. Instead, I want to explicitly specify cutoffs for training, validation, and test data within a GridSearchCV . Can I do this? To better illuminate the question, here's how I would to that manually. import numpy as np import pandas as pd from sklearn.linear_model import Ridge np.random.seed(444) index = pd.date_range('2014', periods=60, freq='M') X, y = make_regression(n_samples=60, n_features=3

Scikit - Combining scale and grid search

阅读更多关于 Scikit - Combining scale and grid search

I am new to scikit, and have 2 slight issues to combine a data scale and grid search. Efficient scaler Considering a cross validation using Kfolds, I would like that each time we train the model on the K-1 folds, the data scaler (using preprocessing.StandardScaler() for instance) is fit only on the K-1 folds and then apply to the remaining fold. My impression is that the following code, will fit the scaler on the entire dataset, and therefore I would like to modify it to behave as described previsouly: classifier = svm.SVC(C=1) clf = make_pipeline(preprocessing.StandardScaler(), classifier)

How to implement SMOTE in cross validation and GridSearchCV

阅读更多关于 How to implement SMOTE in cross validation and GridSearchCV

问题 I'm relatively new to Python. Can you help me improve my implementation of SMOTE to a proper pipeline? What I want is to apply the over and under sampling on the training set of every k-fold iteration so that the model is trained on a balanced data set and evaluated on the imbalanced left out piece. The problem is that when I do that I cannot use the familiar sklearn interface for evaluation and grid search. Is it possible to make something similar to model_selection.RandomizedSearchCV . My