cross-validation

Cross Validation - Weka api

柔情痞子 提交于 2019-12-12 00:33:48
问题 How can I make a classification model by 10-fold cross-validation using Weka Api? I ask this, because each cross-validation's run a new classification model is created. Wich classification model should I use in my test data? Thank you!! 回答1: 10-fold cross validation is used to get an estimate of a classifier's accuracy should that classifier be constructed from all of the training data. It is used when it is felt that there is not enough data for an independent test set. This means that you

Specifiying a selected range of data to be used in leave-one-out (jack-knife) cross-validation for use in the caret::train function

时光毁灭记忆、已成空白 提交于 2019-12-11 15:33:10
问题 This question builds on the question that I asked here: Creating data partitions over a selected range of data to be fed into caret::train function for cross-validation). The data I am working with looks like this: df <- data.frame(Effect = rep(seq(from = 0.05, to = 1, by = 0.05), each = 5), Time = rep(c(1:20,1:20), each = 5), Replicate = c(1:5)) Essentially what I would like to do is create custom partitions, like those generated by the caret::groupKFold function but for these folds to be

Preserve Order for Cross Validation in Weka

岁酱吖の 提交于 2019-12-11 14:08:58
问题 I am using the Weka GUI for classifying sensor data. I have measures of 10 people, the data is sorted. So the first 10% correspond to participant 1, the second 10% to participant 2 etc. I would like to use 10 fold cross validation to build a model on 9 participants and test it on the remaining participant. In my case I believe I could accomplish this by simply not randomizing the data splits. How would I best go about doing this? 回答1: I don't know how to do this in the Explorer. In the

How to put KerasClassifier, Hyperopt and Sklearn cross-validation together

泪湿孤枕 提交于 2019-12-11 12:50:30
问题 I am performing a hyperparameter tuning optimization (hyperopt) tasks with sklearn on a Keras models. I am trying to optimize KerasClassifiers using the Sklearn cross-validation, Some code follows: def create_model(): model = Sequential() model.add( Dense(output_dim=params['units1'], input_dim=features_.shape[1], kernel_initializer="glorot_uniform")) model.add(Activation(params['activation'])) model.add(Dropout(params['dropout1'])) model.add(BatchNormalization()) ... model.compile(loss=

Sci-kit Learn PLS SVD and cross validation

与世无争的帅哥 提交于 2019-12-11 12:33:34
问题 The sklearn.cross_decomposition.PLSSVD class in Sci-kit learn appears to be failing when the response variable has a shape of (N,) instead of (N,1) , where N is the number of samples in the dataset. However, sklearn.cross_validation.cross_val_score fails when the response variable has a shape of (N,1) instead of (N,) . How can I use them together? A snippet of code: from sklearn.pipeline import Pipeline from sklearn.cross_decomposition import PLSSVD from sklearn.preprocessing import

ROC curve plot: 0.50 significant and cross-validation

我与影子孤独终老i 提交于 2019-12-11 11:24:30
问题 I have got two problems of using pROC package to plot the ROC curve. A. The Significance level or P-value is the probability that the observed sample Area under the ROC curve is found when in fact, the true (population) Area under the ROC curve is 0.5 (null hypothesis: Area = 0.5). If P is small (P<0.05) then it can be concluded that the Area under the ROC curve is significantly different from 0.5 and that therefore there is evidence that the laboratory test does have an ability to

Grouping rows from an R dataframe together when randomly assigning to training/testing datasets

只谈情不闲聊 提交于 2019-12-11 10:34:07
问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 4 years ago . I have a dataframe that consists of blocks of X rows, each corresponding to a single individual (where X can be different for each individual). I'd like to randomly distribute these individuals into train, test and validation samples but so far I haven't been able to get the syntax correct to ensure that each of a user's X rows are always collected into the same subsample. For

thread.lock during custom parameter search class using Dask distributed

落花浮王杯 提交于 2019-12-11 08:06:03
问题 I wrote my own parameter search implementation mostly due to the fact that I don't need cross-validation of GridSearch and RandomizedSearch of scikit-learn . I use dask to deliver optimal distributed performance. Here is what I have: from scipy.stats import uniform class Params(object): def __init__(self,fixed,loc=0.0,scale=1.0): self.fixed=fixed self.sched=uniform(loc=loc,scale=scale) def _getsched(self,i,size): return self.sched.rvs(size=size,random_state=i) def param(self,i,size=None): tmp

Possible compatibility issue with Keras, TensorFlow and scikit (tf.global_variables())

旧巷老猫 提交于 2019-12-11 08:02:01
问题 I'm trying to do a small test with my dataset on Keras Regressor (using TensorFlow), but I'm having a small issue. The error seems to be on the function cross_val_score from scikit. It starts on it and the last error message is: File "/usr/local/lib/python2.7/dist-packages/Keras-2.0.2-py2.7.egg/keras/backend/tensorflow_backend.py", line 298, in _initialize_variables variables = tf.global_variables() AttributeError: 'module' object has no attribute 'global_variables' My full code is basically

Avoiding overfitting with H2OGradientBoostingEstimator

旧巷老猫 提交于 2019-12-11 07:52:57
问题 It appears that the difference between cross-validation and training AUC ROC with H2OGradientBoostingEstimator remains high despite my best attempts using min_split_improvement. Using the same data with GradientBoostingClassifier(min_samples_split=10) results in no overfitting, but I can find no analogue of min_samples_split. Prepare Data from sklearn.datasets import make_classification X, y = make_classification(n_samples=10000, n_features=40, n_clusters_per_class=10, n_informative=25,