cross-validation

how use grid search with fit generator in keras

百般思念 提交于 2019-12-03 05:09:23
i want to grid search the parameter of the model with fit_generator as input in keras i find below code in stack overflow and change it 1- but i don't understand how give the fit_generator or flow_from_directory to fit function(last line in the code) 2- how can add early stop? thanks from __future__ import print_function import keras from keras.datasets import mnist from keras.models import Sequential from keras.layers import Dense, Dropout, Activation, Flatten from keras.layers import Conv2D, MaxPooling2D from keras.wrappers.scikit_learn import KerasClassifier from keras import backend as K

How to implement SMOTE in cross validation and GridSearchCV

左心房为你撑大大i 提交于 2019-12-03 03:18:39
I'm relatively new to Python. Can you help me improve my implementation of SMOTE to a proper pipeline? What I want is to apply the over and under sampling on the training set of every k-fold iteration so that the model is trained on a balanced data set and evaluated on the imbalanced left out piece. The problem is that when I do that I cannot use the familiar sklearn interface for evaluation and grid search. Is it possible to make something similar to model_selection.RandomizedSearchCV . My take on this: df = pd.read_csv("Imbalanced_data.csv") #Load the data set X = df.iloc[:,0:64] X = X

Caret Package: Stratified Cross Validation in Train Function

百般思念 提交于 2019-12-03 02:57:46
Is there a way to perform stratified cross validation when using the train function to fit a model to a large imbalanced data set? I know straight forward k fold cross validation is possible but my categories are highly unbalanced. I've seen discussion about this topic but no real definitive answer. Thanks in advance. There is a parameter called 'index' which can let user specified the index to do cross validation. folds <- 4 cvIndex <- createFolds(factor(training$Y), folds, returnTrain = T) tc <- trainControl(index = cvIndex, method = 'cv', number = folds) rfFit <- train(Y ~ ., data =

difference between StratifiedKFold and StratifiedShuffleSplit in sklearn

[亡魂溺海] 提交于 2019-12-03 02:46:00
问题 As from the title I am wondering what is the difference between StratifiedKFold with the parameter shuffle = True StratifiedKFold(n_splits=10, shuffle=True, random_state=0) and StratifiedShuffleSplit StratifiedShuffleSplit(n_splits=10, test_size=’default’, train_size=None, random_state=0) and what is the advantage of using StratifiedShuffleSplit 回答1: In KFolds, each test set should not overlap, even with shuffle. With KFolds and shuffle, the data is shuffled once at the start, and then

What is the difference between cross_val_score with scoring='roc_auc' and roc_auc_score?

只愿长相守 提交于 2019-12-03 02:37:15
I am confused about the difference between the cross_val_score scoring metric 'roc_auc' and the roc_auc_score that I can just import and call directly. The documentation ( http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter ) indicates that specifying scoring='roc_auc' will use the sklearn.metrics.roc_auc_score. However, when I implement GridSearchCV or cross_val_score with scoring='roc_auc' I receive very different numbers that when I call roc_auc_score directly. Here is my code to help demonstrate what I see: # score the model using cross_val_score rf =

Difference between using train_test_split and cross_val_score in sklearn.cross_validation

前提是你 提交于 2019-12-02 23:49:55
I have a matrix with 20 columns. The last column are 0/1 labels. The link to the data is here . I am trying to run random forest on the dataset, using cross validation. I use two methods of doing this: using sklearn.cross_validation.cross_val_score using sklearn.cross_validation.train_test_split I am getting different results when I do what I think is pretty much the same exact thing. To exemplify, I run a two-fold cross validation using the two methods above, as in the code below. import csv import numpy as np import pandas as pd from sklearn import ensemble from sklearn.metrics import roc

Scikit-learn TypeError: If no scoring is specified, the estimator passed should have a 'score' method

*爱你&永不变心* 提交于 2019-12-02 20:08:34
问题 I have created a custom model in python using scikit-learn, and I want to use cross validation. The class for the model is defined as follows: class MultiLabelEnsemble: ''' MultiLabelEnsemble(predictorInstance, balance=False) Like OneVsRestClassifier: Wrapping class to train multiple models when several objectives are given as target values. Its predictor may be an ensemble. This class can be used to create a one-vs-rest classifier from multiple 0/1 labels to treat a multi-label problem or to

k-fold cross validation - how to get the prediction automatically?

你离开我真会死。 提交于 2019-12-02 19:51:44
This may be a silly question but I just can't find a package to do that...I know I can write some codes to get what I want but it would be nice to have a function to do it automatically! So basically I want to do a k-fold cross-validation for a glm model. I want to automatically get the predictions of each validation set and the actual value too. So if I am doing a 10-fold CV, I want a function to return the 10 validation sets with the actual responses and predictions all together. Thank you in advance! As stated in the comments, caret makes cross-validation very easy. Just use the "glm"

Early stopping with Keras and sklearn GridSearchCV cross-validation

房东的猫 提交于 2019-12-02 19:27:54
I wish to implement early stopping with Keras and sklean's GridSearchCV . The working code example below is modified from How to Grid Search Hyperparameters for Deep Learning Models in Python With Keras . The data set may be downloaded from here . The modification adds the Keras EarlyStopping callback class to prevent over-fitting. For this to be effective it requires the monitor='val_acc' argument for monitoring validation accuracy. For val_acc to be available KerasClassifier requires the validation_split=0.1 to generate validation accuracy, else EarlyStopping raises RuntimeWarning: Early

How to perform k-fold cross validation with tensorflow?

蹲街弑〆低调 提交于 2019-12-02 17:37:26
I am following the IRIS example of tensorflow . My case now is I have all data in a single CSV file, not separated, and I want to apply k-fold cross validation on that data. I have data_set = tf.contrib.learn.datasets.base.load_csv(filename="mydata.csv", target_dtype=np.int) How can I perform k-fold cross validation on this dataset with multi-layer neural network as same as IRIS example? I know this question is old but in case someone is looking to do something similar, expanding on ahmedhosny's answer: The new tensorflow datasets API has the ability to create dataset objects using python