cross-validation

Cross validation in deep neural networks

主宰稳场 提交于 2020-05-13 04:11:32
问题 How do you perform cross-validation in a deep neural network? I know that to perform cross validation to will train it on all folds except one and test it on the excluded fold. Then do this for k fold times and average the accuries for each fold. How do you do this for each iteration. Do you update the parameters at each fold? Or you perform k-fold cross validation for each iteration? Or is each training on all folds but one fold considered as one iteration? 回答1: Cross-validation is a general

TypeError: 'KFold' object is not iterable

人走茶凉 提交于 2020-05-10 10:28:37
问题 I'm following one of the kernels on Kaggle, mainly, I'm following A kernel for Credit Card Fraud Detection. I reached the step where I need to perform KFold in order to find the best parameters for Logistic Regression. The following code is shown in the kernel itself, but for some reason (probably older version of scikit-learn, give me some errors). def printing_Kfold_scores(x_train_data,y_train_data): fold = KFold(len(y_train_data),5,shuffle=False) # Different C parameters c_param_range = [0

How i can extracte x_train and y_train from train_generator?

梦想与她 提交于 2020-04-30 11:42:28
问题 In my CNN model I want to extract X_train and y_train from train_generator. I want to use ensemble learning, bagging and boosting to evaluate the model. the main challenge is how i can extract X_train and y_train from train_generator using python language. history=model.fit_generator(train_generator, steps_per_epoch=num_of_train_samples // batch_size, epochs=10, validation_data=validation_generator, validation_steps=num_of_val_samples // batch_size, callbacks=callbacks) 回答1: Well, first of

Cross-validation metrics in scikit-learn for each data split

北城以北 提交于 2020-04-18 03:49:12
问题 Please, I just need to get the cross-validation statistics explicitly for each split of the (X_test, y_test) data. So, to try to do so I did: kf = KFold(n_splits=n_splits) X_train_tmp = [] y_train_tmp = [] X_test_tmp = [] y_test_tmp = [] mae_train_cv_list = [] mae_test_cv_list = [] for train_index, test_index in kf.split(X_train): for i in range(len(train_index)): X_train_tmp.append(X_train[train_index[i]]) y_train_tmp.append(y_train[train_index[i]]) for i in range(len(test_index)): X_test

Cross-validation metrics in scikit-learn for each data split

时光怂恿深爱的人放手 提交于 2020-04-18 03:49:03
问题 Please, I just need to get the cross-validation statistics explicitly for each split of the (X_test, y_test) data. So, to try to do so I did: kf = KFold(n_splits=n_splits) X_train_tmp = [] y_train_tmp = [] X_test_tmp = [] y_test_tmp = [] mae_train_cv_list = [] mae_test_cv_list = [] for train_index, test_index in kf.split(X_train): for i in range(len(train_index)): X_train_tmp.append(X_train[train_index[i]]) y_train_tmp.append(y_train[train_index[i]]) for i in range(len(test_index)): X_test

How to calculate cross validation error for ridge regression model?

送分小仙女□ 提交于 2020-04-16 05:49:28
问题 I am trying to fit a ridge regression model on the white wine dataset. I want to use the entire dataset for training and use 10 fold CV for calculating the test error rate. Thats the main question - how to calculate CV test error for a ridge regressed logistic model. I calculated the best value of lambda (also using CV ), and now I want to find the CV test error rate. Currently, my code for calculating the said CV test error is - cost1 <- function(good, pi=0) mean(abs(good-pi) > 0.5) ridge

Why can't I use cv.glm on the output of bestglm?

与世无争的帅哥 提交于 2020-04-16 05:47:13
问题 I am trying to do best subset selection on the wine dataset, and then I want to get the test error rate using 10 fold CV. The code I used is - cost1 <- function(good, pi=0) mean(abs(good-pi) > 0.5) res.best.logistic <- bestglm(Xy = winedata, family = binomial, # binomial family for logistic IC = "AIC", # Information criteria method = "exhaustive") res.best.logistic$BestModels best.cv.err<- cv.glm(winedata,res.best.logistic$BestModel,cost1, K=10) However, this gives the error - Error in

Put customized functions in Sklearn pipeline

给你一囗甜甜゛ 提交于 2020-04-10 03:36:07
问题 In my classification scheme, there are several steps including: SMOTE (Synthetic Minority Over-sampling Technique) Fisher criteria for feature selection Standardization (Z-score normalisation) SVC (Support Vector Classifier) The main parameters to be tuned in the scheme above are percentile (2.) and hyperparameters for SVC (4.) and I want to go through grid search for tuning. The current solution builds a "partial" pipeline including step 3 and 4 in the scheme clf = Pipeline([('normal'

Put customized functions in Sklearn pipeline

邮差的信 提交于 2020-04-10 03:36:06
问题 In my classification scheme, there are several steps including: SMOTE (Synthetic Minority Over-sampling Technique) Fisher criteria for feature selection Standardization (Z-score normalisation) SVC (Support Vector Classifier) The main parameters to be tuned in the scheme above are percentile (2.) and hyperparameters for SVC (4.) and I want to go through grid search for tuning. The current solution builds a "partial" pipeline including step 3 and 4 in the scheme clf = Pipeline([('normal'

How to get roc auc for binary classification in sklearn

妖精的绣舞 提交于 2020-04-07 07:05:51
问题 I have binary classification problem where I want to calculate the roc_auc of the results. For this purpose, I did it in two different ways using sklearn. My code is as follows. Code 1: from sklearn.metrics import make_scorer from sklearn.metrics import roc_auc_score myscore = make_scorer(roc_auc_score, needs_proba=True) from sklearn.model_selection import cross_validate my_value = cross_validate(clf, X, y, cv=10, scoring = myscore) print(np.mean(my_value['test_score'].tolist())) I get the