cross-validation

PCA within cross validation; however, only with a subset of variables

扶醉桌前 提交于 2021-01-29 20:47:36
问题 This question is very similar to preprocess within cross-validation in caret; however, in a project that i'm working on I would only like to do PCA on three predictors out of 19 in my case. Here is the example from preprocess within cross-validation in caret and I'll use this data ( PimaIndiansDiabetes ) for ease (this is not my project data but concept should be the same). I would then like to do the preProcess only on a subset of variables i.e. PimaIndiansDiabetes[, c(4,5,6)]. Is there a

PCA within cross validation; however, only with a subset of variables

冷暖自知 提交于 2021-01-29 17:54:52
问题 This question is very similar to preprocess within cross-validation in caret; however, in a project that i'm working on I would only like to do PCA on three predictors out of 19 in my case. Here is the example from preprocess within cross-validation in caret and I'll use this data ( PimaIndiansDiabetes ) for ease (this is not my project data but concept should be the same). I would then like to do the preProcess only on a subset of variables i.e. PimaIndiansDiabetes[, c(4,5,6)]. Is there a

TypeError: only integer scalar arrays can be converted to a scalar index , while trying kfold cv

时光毁灭记忆、已成空白 提交于 2021-01-29 12:30:30
问题 Trying to perform Kfold cv on a dataset containing 279 files , the files are of shape ( 279 , 5 , 90) after performing a k-means. I reshaped it in order to fit it on a svm. Now the shape is ( 279, 5*90 ) . Trying the Kfold cv approach gives me the error "TypeError: only integer scalar arrays can be converted to a scalar index " #input with open("dataset.pkl", "rb") as file: dataset = pkl.load(file) print(len(dataset)) x = [i[0] for i in dataset] #k-means cc y = [i[1] for i in dataset] #label

R: can caret::train function for glmnet cross-validate AUC at fixed alpha and lambda?

霸气de小男生 提交于 2021-01-29 10:12:26
问题 I would like to calculate the 10-fold cross-validated AUC of an elastic net regression model with the optimal alpha and lambda using caret::train https://stats.stackexchange.com/questions/69638/does-caret-train-function-for-glmnet-cross-validate-for-both-alpha-and-lambda/69651 explains how to cross-validate alpha and lambda with caret::train My question on Cross Validated got closed, because it has been classified as a programming question: https://stats.stackexchange.com/questions/505865/r

sklearn use RandomizedSearchCV with custom metrics and catch Exceptions

為{幸葍}努か 提交于 2021-01-28 12:35:42
问题 I am using the RandomizedSearchCV function in sklearn with a Random Forest Classifier. To see different metrics i am using a custom scoring from sklearn.metrics import make_scorer, roc_auc_score, recall_score, matthews_corrcoef, balanced_accuracy_score, accuracy_score acc = make_scorer(accuracy_score) auc_score = make_scorer(roc_auc_score) recall = make_scorer(recall_score) mcc = make_scorer(matthews_corrcoef) bal_acc = make_scorer(balanced_accuracy_score) scoring = {"roc_auc_score": auc

Caret and GBM: task 1 failed - “arguments imply differing number of rows”

北城以北 提交于 2021-01-27 14:36:34
问题 I'm trying to run a GBM with caret with the code below: library(caret) library(doParallel) detectCores() registerDoParallel(detectCores() - 1) set.seed(668) in.train <- createDataPartition(y = dat$target, p = 0.80, list = T) ctrl <- trainControl(method = 'cv', number = 2, classProbs = T, verboseIter = T, summaryFunction = LogLossSummary2) gbm.grid <- expand.grid(interaction.depth = 10, n.trees = (2:7) * 50, shrinkage = 0.1) Sys.time() set.seed(1234) gbm.fit <- train(target ~., data = otto.new

Using sklearn's RandomizedSearchCV with SMOTE oversampling only on training folds

我是研究僧i 提交于 2021-01-21 05:34:11
问题 I have a highly unbalanced dataset (99.5:0.5). I would like to perform hyperparameter tuning on a Random Forest model using sklearn 's RandomizedSearchCV . I would like each of the training folds to be oversampled using SMOTE, and then each of the tests to be evaluated on the final fold, keeping the original distribution without any oversampling. Since these test folds are highly unbalanced, I would like the tests to be evaluated using the F1 Score. I have tried the following: from sklearn

In Keras “ImageDataGenerator”, is “validation_split” parameter a kind of K-fold cross validation?

隐身守侯 提交于 2020-12-30 03:59:06
问题 I am trying to do K-fold cross validation on Keras model (with ImageDataGenerator and flow_from_directory for training and validation data), I want to know if the argument "validation_split" in "ImageDataGenerator" test_datagen = ImageDataGenerator( rescale=1. / 255, rotation_range = 180, width_shift_range = 0.2, height_shift_range = 0.2, brightness_range = (0.8, 1.2), shear_range = 0.2, zoom_range = 0.2, horizontal_flip = True, vertical_flip = True, validation_split = 0.1 ) train_datagen =

How to compute precision,recall and f1 score of an imbalanced dataset for K fold cross validation with 10 folds in python

拟墨画扇 提交于 2020-12-27 10:09:34
问题 I have an imbalanced dataset containing binary classification problem.I have built Random Forest Classifier and used k fold cross validation with 10 folds. kfold = model_selection.KFold(n_splits=10, random_state=42) model=RandomForestClassifier(n_estimators=50) I got the results of the 10 folds results = model_selection.cross_val_score(model,features,labels, cv=kfold) print results [ 0.60666667 0.60333333 0.52333333 0.73 0.75333333 0.72 0.7 0.73 0.83666667 0.88666667] I have calculated

How to compute precision,recall and f1 score of an imbalanced dataset for K fold cross validation with 10 folds in python

纵饮孤独 提交于 2020-12-27 10:09:12
问题 I have an imbalanced dataset containing binary classification problem.I have built Random Forest Classifier and used k fold cross validation with 10 folds. kfold = model_selection.KFold(n_splits=10, random_state=42) model=RandomForestClassifier(n_estimators=50) I got the results of the 10 folds results = model_selection.cross_val_score(model,features,labels, cv=kfold) print results [ 0.60666667 0.60333333 0.52333333 0.73 0.75333333 0.72 0.7 0.73 0.83666667 0.88666667] I have calculated