cross-validation

How can we plot ROC Curve for leave one out (LOO) cross validation using scikit-learn?

落爺英雄遲暮 提交于 2019-12-13 06:33:30
问题 On the scikit-learn website there is code for ROC curve for stratified kfold but no code is present for leave one out (LOO) cross validation. I tried by changing the code of kfold by loo but the result is nan. What can be the problem? I checked scikit-learn website but no code is there for loo cross validation. The code for stratified kfold is given in the link given below. https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc_crossval.html 来源: https://stackoverflow.com

How to give GridSearchCV a list of indicies for cross-validation?

蹲街弑〆低调 提交于 2019-12-13 02:49:22
问题 I'm trying to use custom cross-validation sets for a very specific dataset and scikit-optimize using BayesSearchCV . I've been able to replicate the error with scikit-learn using GridSearchCV . Straight from the documentation: cv : int, cross-validation generator or an iterable, optional Determines the cross-validation splitting strategy. Possible inputs for cv are: None, to use the default 3-fold cross validation, integer, to specify the number of folds in a (Stratified)KFold, An object to

sklearn random forest: .oob_score_ too low?

百般思念 提交于 2019-12-12 11:29:35
问题 I was searching for applications for random forests, and I found the following knowledge competition on Kaggle: https://www.kaggle.com/c/forest-cover-type-prediction. Following the advice at https://www.kaggle.com/c/forest-cover-type-prediction/forums/t/8182/first-try-with-random-forests-scikit-learn, I used sklearn to build a random forest with 500 trees. The .oob_score_ was ~2%, but the score on the holdout set was ~75%. There are only seven classes to classify, so 2% is really low. I also

understanding python xgboost cv

旧巷老猫 提交于 2019-12-12 07:12:37
问题 I would like to use the xgboost cv function to find the best parameters for my training data set. I am confused by the api. How do I find the best parameter? Is this similar to the sklearn grid_search cross-validation function? How can I find which of the options for the max_depth parameter ([2,4,6]) was determined optimal? from sklearn.datasets import load_iris import xgboost as xgb iris = load_iris() DTrain = xgb.DMatrix(iris.data, iris.target) x_parameters = {"max_depth":[2,4,6]} xgb.cv(x

Code with 10 fold cross validation in machine learning

元气小坏坏 提交于 2019-12-12 03:55:23
问题 I am just starting to work with machine learning. I tried to run a 10 fold cross-validation using a C5.0 model. I asked the code to return the kappa value. folds = createFolds(mdd.cohort1$edmsemmancomprej, k=10) str(folds) mdd.cohort1_train = mdd.cohort1[-folds$Fold01,] mdd.cohort1_test = mdd.cohort1[folds$Fold01,] library(caret) library(C5.0) library(irr) set.seed(123) folds = createFolds(mdd.cohort1$edmsemmancomprej, k=10) cv_results = lapply(folds, function(x) {mdd.cohort1_train = mdd

Leave-one-out cross validation for IDW in R

核能气质少年 提交于 2019-12-12 03:55:14
问题 I am trying to check the results of IDW interpolation by leave-one-out cross validation and then get the RMSE to see the quality of the prediction. From github Interpolation in R, I found some hints and apply it in my case as following: I have 63 locations which is saved as a SpatialPointDataFrame, named x_full_utm_2001 . For each location, there is attached precipitation data, named sumdata_2001 . idw.out<- vector(length = length(sumdata_2001$Jan)) for (i in 1:length(sumdata_2001$Jan)) { idw

Issue with Cross Validation

纵然是瞬间 提交于 2019-12-12 03:44:02
问题 I want to use leave one out cross validation. But i am getting below error: AttributeError Traceback (most recent call last) <ipython-input-19-f15f1e522706> in <module>() 3 loo = LeaveOneOut(num_of_examples) 4 #loo.get_n_splits(X_train_std) ----> 5 for train, test in loo.split(X_train_std): 6 print("%s %s" % (train, test)) AttributeError: 'LeaveOneOut' object has no attribute 'split' The detailed code is as follows: from sklearn.cross_validation import train_test_split X_train, X_test, y

Sklearn confusion matrix estimation by cross validation

旧时模样 提交于 2019-12-12 03:32:16
问题 I am trying to estimate the confusion matrix of a classifier using 10-fold cross-validation with sklearn. To compute the confusion matrix I am using sklearn.metrics.confusion_matrix . I know that I can evaluate a model with cv using sklearn.model_selection.cross_val_score and sklearn.metrics.make_scorer like: from sklearn.metrics import confusion_matrix, make_scorer from sklearn.model_selection import cross_val_score cm = cross_val_score(clf, X, y, make_scorer(confusion_matrix)) Where clf is

how to do cross-validation for block kriging?

半城伤御伤魂 提交于 2019-12-12 02:02:52
问题 I have written a code in automap package to cross-validate different kriging techniques. I have cross-validated all of them, but I cannot write the code for Block kriging. It shows this error: unused argument (block=c(400,400)) library(automap) mydata<-read.table(".../mydata.txt",header=T,sep=",") colnames(mydata)=c("x","y","data1") library(gstat) coordinates(mydata)=~x+y mygrids<-read.table(".../grids.txt",header=T,sep=",") gridded(mygrids)=~x+y block_kriging_cv<-autoKrige.cv(log(data1)~x+y,

Get predictions on test sets in MLR

天大地大妈咪最大 提交于 2019-12-12 01:24:40
问题 I'm fitting classification models for binary issues using MLR package in R. For each model, I perform a cross-validation with embedded feature selection using "selectFeatures" function and retrieve mean AUCs over test sets. I would like next to retrieve predictions on the test sets for each fold but this function does not seem to support that. I already tried to plug selected predictors into the "resample" function to get it. It works, but performance metrics are different which is not