cross-validation

Getting p-values from leave-one-out in R

本秂侑毒 提交于 2019-12-24 04:49:28
问题 I have a data frame of 96 observations (patients) and 1098 variables (genes). The response is binary (Y and N) and the predictors are numeric. I am trying to perform leave-one-out cross validation, but my interest is not standard error, but the p-values for each variable from each of the 95 logistic regression models created from LOOCV. These are my attempts thus far: #Data frame 96 observations 1098 variables DF2 fit <- list() for (i in 1:96){ df <- DF2[-i,] fit[[i]] <- glm (response ~.,

R caret: leave subject out cross validation with data subset for training?

☆樱花仙子☆ 提交于 2019-12-24 01:44:10
问题 I want to perform leave subject out cross validation with R caret (cf. this example) but only use a subset of the data in training for creating CV models. Still, the left out CV partition should be used as a whole, as I need to test on all data of a left out subject (no matter if it's millions of samples that cannot be used in training due to computational restrictions). I've created a minimal 2 class classification example using the subset and index parameters of caret::train and caret:

Cross validation with KNN classifier in Matlab

大憨熊 提交于 2019-12-24 00:50:53
问题 I am trying to extend this answer to knn classifier: load fisheriris; % // convert species to double isnum = cellfun(@isnumeric,species); result = NaN(size(species)); result(isnum) = [species{isnum}]; % // Crossvalidation vals = crossval(@(XTRAIN, YTRAIN, XTEST, YTEST)fun_knn(XTRAIN, YTRAIN, XTEST, YTEST), meas, result); the fun_knn funcion is: function testval = fun_knn(XTRAIN, YTRAIN, XTEST, YTEST) yknn = knnclassify(XTEST, XTRAIN, YTRAIN); [~,classNet] = max(yknn,[],2); [~,classTest] = max

SKLearn cross-validation: How to pass info on fold examples to my scorer function?

点点圈 提交于 2019-12-24 00:47:46
问题 I am trying to craft a custom scorer function for cross-validating my (binary classification) model in scikit-learn (Python). Some examples of my raw test data: Source Feature1 Feature2 Feature3 123 0.1 0.2 0.3 123 0.4 0.5 0.6 456 0.7 0.8 0.9 Assuming that any fold might contain multiple test examples that come from the same source... Then for the set of examples with the same source, I want my custom scorer to "decide" the "winner" to be the example for which the model spit out the higher

Perform data transformation on training data inside cross validation

寵の児 提交于 2019-12-23 23:42:11
问题 I would like to do cross validation for 5 folds. In each fold, I have a training and valid set. However, due to data issue, I need to transform my data. First, I transform the training data, train the model,apply the transformation rule to the validation data, and then test the model. I need to redo the transformation for every fold. How would I do that in H2O? I can't find away to separate the transformation part out. Does anyone have any suggestion? 来源: https://stackoverflow.com/questions

How can I pass the Hyperopt params to KerasClassifier if I set conditional search space

こ雲淡風輕ζ 提交于 2019-12-23 04:38:08
问题 Thanks to the good answer in my last post (How to put KerasClassifier, Hyperopt and Sklearn cross-validation together), it is great help. I have further questions: if I set conditional search space like: second_layer_search_space = \ hp.choice('second_layer', [ { 'include': False, }, { 'include': True, 'layer_size': hp.choice('layer_size', np.arange(5, 26, 5)), } ]) space = { 'second_layer': second_layer_search_space, 'units1': hp.choice('units1', [12, 64]), 'dropout': hp.choice('dropout1',

How can I pass the Hyperopt params to KerasClassifier if I set conditional search space

孤街醉人 提交于 2019-12-23 04:38:05
问题 Thanks to the good answer in my last post (How to put KerasClassifier, Hyperopt and Sklearn cross-validation together), it is great help. I have further questions: if I set conditional search space like: second_layer_search_space = \ hp.choice('second_layer', [ { 'include': False, }, { 'include': True, 'layer_size': hp.choice('layer_size', np.arange(5, 26, 5)), } ]) space = { 'second_layer': second_layer_search_space, 'units1': hp.choice('units1', [12, 64]), 'dropout': hp.choice('dropout1',

caret: using random forest and include cross-validation

为君一笑 提交于 2019-12-22 15:14:04
问题 I used the caret package to train a random forest, including repeated cross-validation. I’d like to know whether the OOB, as in the original RF by Breiman, is used or whether this is replaced by the cross-validation. If it is replaced, do I have the same advantages as described in Breiman 2001, like increased accuracy by reducing the correlation between input data? As OOB is drawn with replacement and CV is drawn without replacement, are both procedures comparable? What is the OOB estimate of

caret: using random forest and include cross-validation

僤鯓⒐⒋嵵緔 提交于 2019-12-22 15:13:07
问题 I used the caret package to train a random forest, including repeated cross-validation. I’d like to know whether the OOB, as in the original RF by Breiman, is used or whether this is replaced by the cross-validation. If it is replaced, do I have the same advantages as described in Breiman 2001, like increased accuracy by reducing the correlation between input data? As OOB is drawn with replacement and CV is drawn without replacement, are both procedures comparable? What is the OOB estimate of

Running CrossValidationCV in parallel

被刻印的时光 ゝ 提交于 2019-12-22 14:51:22
问题 When I run a GridsearchCV() and a RandomizedsearchCV() methods in parallel ( having n_jobs>1 or n_jobs=-1 options set ) it shows this message: ImportError: [joblib] Attempting to do parallel computing without protecting your import on a system that does not support forking. To use parallel-computing in a script, you must protect your main loop using "if name == 'main'". Please see the joblib documentation on Parallel for more information" I put the code in a class in .py file and call it