cross-validation | 易学教程

How can we plot ROC Curve for leave one out (LOO) cross validation using scikit-learn?

阅读更多关于 How can we plot ROC Curve for leave one out (LOO) cross validation using scikit-learn?

问题 On the scikit-learn website there is code for ROC curve for stratified kfold but no code is present for leave one out (LOO) cross validation. I tried by changing the code of kfold by loo but the result is nan. What can be the problem? I checked scikit-learn website but no code is there for loo cross validation. The code for stratified kfold is given in the link given below. https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc_crossval.html 来源： https://stackoverflow.com

How to give GridSearchCV a list of indicies for cross-validation?

阅读更多关于 How to give GridSearchCV a list of indicies for cross-validation?

问题 I'm trying to use custom cross-validation sets for a very specific dataset and scikit-optimize using BayesSearchCV . I've been able to replicate the error with scikit-learn using GridSearchCV . Straight from the documentation: cv : int, cross-validation generator or an iterable, optional Determines the cross-validation splitting strategy. Possible inputs for cv are: None, to use the default 3-fold cross validation, integer, to specify the number of folds in a (Stratified)KFold, An object to

sklearn random forest: .oob_score_ too low?

阅读更多关于 sklearn random forest: .oob_score_ too low?

问题 I was searching for applications for random forests, and I found the following knowledge competition on Kaggle: https://www.kaggle.com/c/forest-cover-type-prediction. Following the advice at https://www.kaggle.com/c/forest-cover-type-prediction/forums/t/8182/first-try-with-random-forests-scikit-learn, I used sklearn to build a random forest with 500 trees. The .oob_score_ was ~2%, but the score on the holdout set was ~75%. There are only seven classes to classify, so 2% is really low. I also

understanding python xgboost cv

阅读更多关于 understanding python xgboost cv

问题 I would like to use the xgboost cv function to find the best parameters for my training data set. I am confused by the api. How do I find the best parameter? Is this similar to the sklearn grid_search cross-validation function? How can I find which of the options for the max_depth parameter ([2,4,6]) was determined optimal? from sklearn.datasets import load_iris import xgboost as xgb iris = load_iris() DTrain = xgb.DMatrix(iris.data, iris.target) x_parameters = {"max_depth":[2,4,6]} xgb.cv(x

Code with 10 fold cross validation in machine learning

阅读更多关于 Code with 10 fold cross validation in machine learning

问题 I am just starting to work with machine learning. I tried to run a 10 fold cross-validation using a C5.0 model. I asked the code to return the kappa value. folds = createFolds(mdd.cohort1$edmsemmancomprej, k=10) str(folds) mdd.cohort1_train = mdd.cohort1[-folds$Fold01,] mdd.cohort1_test = mdd.cohort1[folds$Fold01,] library(caret) library(C5.0) library(irr) set.seed(123) folds = createFolds(mdd.cohort1$edmsemmancomprej, k=10) cv_results = lapply(folds, function(x) {mdd.cohort1_train = mdd

Leave-one-out cross validation for IDW in R

阅读更多关于 Leave-one-out cross validation for IDW in R

问题 I am trying to check the results of IDW interpolation by leave-one-out cross validation and then get the RMSE to see the quality of the prediction. From github Interpolation in R, I found some hints and apply it in my case as following: I have 63 locations which is saved as a SpatialPointDataFrame, named x_full_utm_2001 . For each location, there is attached precipitation data, named sumdata_2001 . idw.out<- vector(length = length(sumdata_2001$Jan)) for (i in 1:length(sumdata_2001$Jan)) { idw

Issue with Cross Validation

阅读更多关于 Issue with Cross Validation

问题 I want to use leave one out cross validation. But i am getting below error: AttributeError Traceback (most recent call last) <ipython-input-19-f15f1e522706> in <module>() 3 loo = LeaveOneOut(num_of_examples) 4 #loo.get_n_splits(X_train_std) ----> 5 for train, test in loo.split(X_train_std): 6 print("%s %s" % (train, test)) AttributeError: 'LeaveOneOut' object has no attribute 'split' The detailed code is as follows: from sklearn.cross_validation import train_test_split X_train, X_test, y

Sklearn confusion matrix estimation by cross validation

阅读更多关于 Sklearn confusion matrix estimation by cross validation

问题 I am trying to estimate the confusion matrix of a classifier using 10-fold cross-validation with sklearn. To compute the confusion matrix I am using sklearn.metrics.confusion_matrix . I know that I can evaluate a model with cv using sklearn.model_selection.cross_val_score and sklearn.metrics.make_scorer like: from sklearn.metrics import confusion_matrix, make_scorer from sklearn.model_selection import cross_val_score cm = cross_val_score(clf, X, y, make_scorer(confusion_matrix)) Where clf is

how to do cross-validation for block kriging?

阅读更多关于 how to do cross-validation for block kriging?

问题 I have written a code in automap package to cross-validate different kriging techniques. I have cross-validated all of them, but I cannot write the code for Block kriging. It shows this error: unused argument (block=c(400,400)) library(automap) mydata<-read.table(".../mydata.txt",header=T,sep=",") colnames(mydata)=c("x","y","data1") library(gstat) coordinates(mydata)=~x+y mygrids<-read.table(".../grids.txt",header=T,sep=",") gridded(mygrids)=~x+y block_kriging_cv<-autoKrige.cv(log(data1)~x+y,

Get predictions on test sets in MLR

阅读更多关于 Get predictions on test sets in MLR

问题 I'm fitting classification models for binary issues using MLR package in R. For each model, I perform a cross-validation with embedded feature selection using "selectFeatures" function and retrieve mean AUCs over test sets. I would like next to retrieve predictions on the test sets for each fold but this function does not seem to support that. I already tried to plug selected predictors into the "resample" function to get it. It works, but performance metrics are different which is not