grid-search

How to use `log_loss` in `GridSearchCV` with multi-class labels in Scikit-Learn (sklearn)?

*爱你&永不变心* 提交于 2019-12-05 12:14:04
I'm trying to use the log_loss argument in the scoring parameter of GridSearchCV to tune this multi-class (6 classes) classifier. I don't understand how to give it a label parameter. Even if I gave it sklearn.metrics.log_loss , it would change for each iteration in the cross-validation so I don't understand how to give it the labels parameter? I'm using Python v3.6 and Scikit-Learn v0.18.1 How can I use GridSearchCV with log_loss with multi-class model tuning? My class representation: 1 31 2 18 3 28 4 19 5 17 6 22 Name: encoding, dtype: int64 My code: param_test = {"criterion": ["friedman_mse"

Pipeline and GridSearch for Doc2Vec

感情迁移 提交于 2019-12-05 09:30:27
I currently have following script that helps to find the best model for a doc2vec model. It works like this: First train a few models based on given parameters and then test against a classifier. Finally, it outputs the best model and classifier (I hope). Data Example data (data.csv) can be downloaded here: https://pastebin.com/takYp6T8 Note that the data has a structure that should make an ideal classifier with 1.0 accuracy. Script import sys import os from time import time from operator import itemgetter import pickle import pandas as pd import numpy as np from argparse import ArgumentParser

More than one estimator in GridSearchCV(sklearn)

十年热恋 提交于 2019-12-04 08:16:25
I was checking sklearn documentation webpage about GridSearchCV . One of attributes of GridSearchCV object is best_estimator_ . So here is my question. How to pass more than one estimator to GSCV object? Using a dictionary like: {'SVC()':{'C':10, 'gamma':0.01}, ' DecTreeClass()':{....}} ? GridSearchCV works on parameters. It will train multiple estimators (but same class (one of SVC, or DecisionTreeClassifier, or other classifiers) with different parameter combinations from specified in param_grid . best_estimator_ is the estimator which performs best on the data. So essentially best_estimator

GridSearchCV/RandomizedSearchCV with LSTM

邮差的信 提交于 2019-12-04 05:37:44
问题 I am stuck on the trying to tune hyperparameters for LSTM via RandomizedSearchCV. My code is below: X_train = X_train.reshape((X_train.shape[0], 1, X_train.shape[1])) X_test = X_test.reshape((X_test.shape[0], 1, X_test.shape[1])) print(X_train.shape, y_train.shape, X_test.shape, y_test.shape) from imblearn.pipeline import Pipeline from keras.initializers import RandomNormal def create_model(activation_1='relu', activation_2='relu', neurons_input = 1, neurons_hidden_1=1, optimizer='Adam' ,

TypeError grid seach

让人想犯罪 __ 提交于 2019-12-04 05:32:47
问题 I used to create loop for finding the best parameters for my model which increased my errors in coding so I decided to use GridSearchCV . I am trying to find out the best parameters for PCA for my model (the only parameter I want to grid search on). In this model, after normalization I want to combine the original features with the PCA reduced features and then apply the linear SVM. Then I save the whole model to predict my input on. I have an error in the line where I try to fit the data so

How to access Scikit Learn nested cross-validation scores

你离开我真会死。 提交于 2019-12-04 04:10:06
问题 I'm using python and I would like to use nested cross-validation with scikit learn. I have found a very good example: NUM_TRIALS = 30 non_nested_scores = np.zeros(NUM_TRIALS) nested_scores = np.zeros(NUM_TRIALS) # Choose cross-validation techniques for the inner and outer loops, # independently of the dataset. # E.g "LabelKFold", "LeaveOneOut", "LeaveOneLabelOut", etc. inner_cv = KFold(n_splits=4, shuffle=True, random_state=i) outer_cv = KFold(n_splits=4, shuffle=True, random_state=i) # Non

How to pass elegantly Sklearn's GridseachCV's best parameters to another model?

只谈情不闲聊 提交于 2019-12-03 16:49:22
问题 I have found a set of best hyperparameters for my KNN estimator with Grid Search CV: >>> knn_gridsearch_model.best_params_ {'algorithm': 'auto', 'metric': 'manhattan', 'n_neighbors': 3} So far, so good. I want to train my final estimator with these new-found parameters. Is there a way to feed the above hyperparameter dict to it directly? I tried this: >>> new_knn_model = KNeighborsClassifier(knn_gridsearch_model.best_params_) but instead the hoped result new_knn_model just got the whole dict

Explicitly specifying test/train sets in GridSearchCV

核能气质少年 提交于 2019-12-03 16:24:56
I have a question about the cv parameter of sklearn's GridSearchCV . I'm working with data that has a time component to it, so I don't think random shuffling within KFold cross-validation seems sensible. Instead, I want to explicitly specify cutoffs for training, validation, and test data within a GridSearchCV . Can I do this? To better illuminate the question, here's how I would to that manually. import numpy as np import pandas as pd from sklearn.linear_model import Ridge np.random.seed(444) index = pd.date_range('2014', periods=60, freq='M') X, y = make_regression(n_samples=60, n_features=3

Scikit - Combining scale and grid search

一个人想着一个人 提交于 2019-12-03 16:21:55
I am new to scikit, and have 2 slight issues to combine a data scale and grid search. Efficient scaler Considering a cross validation using Kfolds, I would like that each time we train the model on the K-1 folds, the data scaler (using preprocessing.StandardScaler() for instance) is fit only on the K-1 folds and then apply to the remaining fold. My impression is that the following code, will fit the scaler on the entire dataset, and therefore I would like to modify it to behave as described previsouly: classifier = svm.SVC(C=1) clf = make_pipeline(preprocessing.StandardScaler(), classifier)

How to implement SMOTE in cross validation and GridSearchCV

不羁的心 提交于 2019-12-03 12:45:35
问题 I'm relatively new to Python. Can you help me improve my implementation of SMOTE to a proper pipeline? What I want is to apply the over and under sampling on the training set of every k-fold iteration so that the model is trained on a balanced data set and evaluated on the imbalanced left out piece. The problem is that when I do that I cannot use the familiar sklearn interface for evaluation and grid search. Is it possible to make something similar to model_selection.RandomizedSearchCV . My