cross-validation

How to split an image datastore for cross-validation in MATLAB?

佐手、 提交于 2019-12-22 13:52:47
问题 In MATLAB the method splitEachLabel of an imageDatastore object splits an image data store into proportions per category label. How can one split an image data store for training using cross-validation and using the trainImageCategoryCalssifier class? I.e. it's easy to split it in N partitions, but then some sort of _mergeEachLabel_ functionality is needed to be able to train a classifier using cross-validation. Or is there another way of achieving that? Regards, Elena 回答1: I stumbled on the

Rolling window REVISITED - Adding window rolling quantity as a parameter- Walk Forward Analysis

六眼飞鱼酱① 提交于 2019-12-22 05:36:07
问题 I have been searching the web for methods that could create rolling windows so that I can perform a cross-validation technique known as Walk Forward Analysis for time series in a generalized manner. However, I have not get around to any solution that incorporates flexibility in terms of 1) the window size (almost all methods have this; for example, pandas rolling or a bit different np.roll) and 2) window rolling quantity, understood as how many indexes do we want to roll the window (i.e.

Randomized stratified k-fold cross-validation in scikit-learn?

ぐ巨炮叔叔 提交于 2019-12-22 05:35:12
问题 Is there any built-in way to get scikit-learn to perform shuffled stratified k-fold cross-validation? This is one of the most common CV methods, and I am surprised I couldn't find a built-in method to do this. I saw that cross_validation.KFold() has a shuffling flag, but it is not stratified. Unfortunately cross_validation.StratifiedKFold() does not have such an option, and cross_validation.StratifiedShuffleSplit() does not produce disjoint folds. Am I missing something? Is this planned?

Randomized stratified k-fold cross-validation in scikit-learn?

不羁岁月 提交于 2019-12-22 05:35:08
问题 Is there any built-in way to get scikit-learn to perform shuffled stratified k-fold cross-validation? This is one of the most common CV methods, and I am surprised I couldn't find a built-in method to do this. I saw that cross_validation.KFold() has a shuffling flag, but it is not stratified. Unfortunately cross_validation.StratifiedKFold() does not have such an option, and cross_validation.StratifiedShuffleSplit() does not produce disjoint folds. Am I missing something? Is this planned?

Classification table for logistic regression in R

最后都变了- 提交于 2019-12-22 05:27:13
问题 I have a data set consisting of a dichotomous depending variable ( Y ) and 12 independent variables ( X1 to X12 ) stored in a csv file. Here are the first 5 rows of the data: Y,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,X11,X12 0,9,3.86,111,126,14,13,1,7,7,0,M,46-50 1,7074,3.88,232,4654,143,349,2,27,18,6,M,25-30 1,5120,27.45,97,2924,298,324,3,56,21,0,M,31-35 1,18656,79.32,408,1648,303,8730,286,294,62,28,M,25-30 0,3869,21.23,260,2164,550,320,3,42,203,3,F,18-24 I constructed a logistic regression model

Python: LightGBM cross validation. How to use lightgbm.cv for regression?

↘锁芯ラ 提交于 2019-12-22 03:22:45
问题 I want to do a cross validation for LightGBM model with lgb.Dataset and use early_stopping_rounds . The following approach works without a problem with XGBoost's xgboost.cv . I prefer not to use Scikit Learn's approach with GridSearchCV, because it doesn't support early stopping or lgb.Dataset. import lightgbm as lgb from sklearn.metrics import mean_absolute_error dftrainLGB = lgb.Dataset(data = dftrain, label = ytrain, feature_name = list(dftrain)) params = {'objective': 'regression'} cv

how to obtain the trained best model from a crossvalidator

孤街浪徒 提交于 2019-12-21 16:55:16
问题 I built a pipeline including a DecisionTreeClassifier(dt) like this val pipeline = new Pipeline().setStages(Array(labelIndexer, featureIndexer, dt, labelConverter)) Then I used this pipeline as the estimator in a CrossValidator in order to get a model with the best set of hyperparameters like this val c_v = new CrossValidator().setEstimator(pipeline).setEvaluator(new MulticlassClassificationEvaluator().setLabelCol("indexedLabel").setPredictionCol("prediction")).setEstimatorParamMaps(paramGrid

How to custom a model in CARET to perform PLS-[Classifer] two-step classificaton model?

你离开我真会死。 提交于 2019-12-21 05:10:21
问题 This question is a continuation of the same thread here. Below is a minimal working example taken from this book: Wehrens R. Chemometrics with R multivariate data analysis in the natural sciences and life sciences. 1st edition. Heidelberg; New York: Springer. 2011. (page 250). The example was taken from this book and its package ChemometricsWithR . It highlighted some pitfalls when modeling using cross-validation techniques. The Aim: A cross-validated methodology using the same set of

How to Plot PR-Curve Over 10 folds of Cross Validation in Scikit-Learn

╄→гoц情女王★ 提交于 2019-12-21 04:49:10
问题 I'm running some supervised experiments for a binary prediction problem. I'm using 10-fold cross validation to evaluate performance in terms of mean average precision (average precision for each fold divided by the number of folds for cross validation - 10 in my case). I would like to plot PR-curves of the result of mean average precision over these 10 folds, however I'm not sure the best way to do this. A previous question in the Cross Validated Stack Exchange site raised this same problem.

CARET. Relationship between data splitting and trainControl

旧街凉风 提交于 2019-12-21 03:31:10
问题 I have carefully read the CARET documentation at: http://caret.r-forge.r-project.org/training.html, the vignettes, and everything is quite clear (the examples on the website help a lot!), but I am still a confused about the relationship between two arguments to trainControl : method index and the interplay between trainControl and the data splitting functions in caret (e.g. createDataPartition , createResample , createFolds and createMultiFolds ) To better frame my questions, let me use the