cross-validation | 易学教程

How to split an image datastore for cross-validation in MATLAB?

阅读更多关于 How to split an image datastore for cross-validation in MATLAB?

问题 In MATLAB the method splitEachLabel of an imageDatastore object splits an image data store into proportions per category label. How can one split an image data store for training using cross-validation and using the trainImageCategoryCalssifier class? I.e. it's easy to split it in N partitions, but then some sort of _mergeEachLabel_ functionality is needed to be able to train a classifier using cross-validation. Or is there another way of achieving that? Regards, Elena 回答1: I stumbled on the

Rolling window REVISITED - Adding window rolling quantity as a parameter- Walk Forward Analysis

阅读更多关于 Rolling window REVISITED - Adding window rolling quantity as a parameter- Walk Forward Analysis

问题 I have been searching the web for methods that could create rolling windows so that I can perform a cross-validation technique known as Walk Forward Analysis for time series in a generalized manner. However, I have not get around to any solution that incorporates flexibility in terms of 1) the window size (almost all methods have this; for example, pandas rolling or a bit different np.roll) and 2) window rolling quantity, understood as how many indexes do we want to roll the window (i.e.

Randomized stratified k-fold cross-validation in scikit-learn?

阅读更多关于 Randomized stratified k-fold cross-validation in scikit-learn?

问题 Is there any built-in way to get scikit-learn to perform shuffled stratified k-fold cross-validation? This is one of the most common CV methods, and I am surprised I couldn't find a built-in method to do this. I saw that cross_validation.KFold() has a shuffling flag, but it is not stratified. Unfortunately cross_validation.StratifiedKFold() does not have such an option, and cross_validation.StratifiedShuffleSplit() does not produce disjoint folds. Am I missing something? Is this planned?

Randomized stratified k-fold cross-validation in scikit-learn?

阅读更多关于 Randomized stratified k-fold cross-validation in scikit-learn?

Classification table for logistic regression in R

阅读更多关于 Classification table for logistic regression in R

问题 I have a data set consisting of a dichotomous depending variable ( Y ) and 12 independent variables ( X1 to X12 ) stored in a csv file. Here are the first 5 rows of the data: Y,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,X11,X12 0,9,3.86,111,126,14,13,1,7,7,0,M,46-50 1,7074,3.88,232,4654,143,349,2,27,18,6,M,25-30 1,5120,27.45,97,2924,298,324,3,56,21,0,M,31-35 1,18656,79.32,408,1648,303,8730,286,294,62,28,M,25-30 0,3869,21.23,260,2164,550,320,3,42,203,3,F,18-24 I constructed a logistic regression model

Python: LightGBM cross validation. How to use lightgbm.cv for regression?

阅读更多关于 Python: LightGBM cross validation. How to use lightgbm.cv for regression?

问题 I want to do a cross validation for LightGBM model with lgb.Dataset and use early_stopping_rounds . The following approach works without a problem with XGBoost's xgboost.cv . I prefer not to use Scikit Learn's approach with GridSearchCV, because it doesn't support early stopping or lgb.Dataset. import lightgbm as lgb from sklearn.metrics import mean_absolute_error dftrainLGB = lgb.Dataset(data = dftrain, label = ytrain, feature_name = list(dftrain)) params = {'objective': 'regression'} cv

how to obtain the trained best model from a crossvalidator

阅读更多关于 how to obtain the trained best model from a crossvalidator

问题 I built a pipeline including a DecisionTreeClassifier(dt) like this val pipeline = new Pipeline().setStages(Array(labelIndexer, featureIndexer, dt, labelConverter)) Then I used this pipeline as the estimator in a CrossValidator in order to get a model with the best set of hyperparameters like this val c_v = new CrossValidator().setEstimator(pipeline).setEvaluator(new MulticlassClassificationEvaluator().setLabelCol("indexedLabel").setPredictionCol("prediction")).setEstimatorParamMaps(paramGrid

How to custom a model in CARET to perform PLS-[Classifer] two-step classificaton model?

阅读更多关于 How to custom a model in CARET to perform PLS-[Classifer] two-step classificaton model?

问题 This question is a continuation of the same thread here. Below is a minimal working example taken from this book: Wehrens R. Chemometrics with R multivariate data analysis in the natural sciences and life sciences. 1st edition. Heidelberg; New York: Springer. 2011. (page 250). The example was taken from this book and its package ChemometricsWithR . It highlighted some pitfalls when modeling using cross-validation techniques. The Aim: A cross-validated methodology using the same set of

How to Plot PR-Curve Over 10 folds of Cross Validation in Scikit-Learn

阅读更多关于 How to Plot PR-Curve Over 10 folds of Cross Validation in Scikit-Learn

问题 I'm running some supervised experiments for a binary prediction problem. I'm using 10-fold cross validation to evaluate performance in terms of mean average precision (average precision for each fold divided by the number of folds for cross validation - 10 in my case). I would like to plot PR-curves of the result of mean average precision over these 10 folds, however I'm not sure the best way to do this. A previous question in the Cross Validated Stack Exchange site raised this same problem.

CARET. Relationship between data splitting and trainControl

阅读更多关于 CARET. Relationship between data splitting and trainControl

问题 I have carefully read the CARET documentation at: http://caret.r-forge.r-project.org/training.html, the vignettes, and everything is quite clear (the examples on the website help a lot!), but I am still a confused about the relationship between two arguments to trainControl : method index and the interplay between trainControl and the data splitting functions in caret (e.g. createDataPartition , createResample , createFolds and createMultiFolds ) To better frame my questions, let me use the