cross-validation | 易学教程

Cross Validation - Weka api

阅读更多关于 Cross Validation - Weka api

问题 How can I make a classification model by 10-fold cross-validation using Weka Api? I ask this, because each cross-validation's run a new classification model is created. Wich classification model should I use in my test data? Thank you!! 回答1: 10-fold cross validation is used to get an estimate of a classifier's accuracy should that classifier be constructed from all of the training data. It is used when it is felt that there is not enough data for an independent test set. This means that you

Specifiying a selected range of data to be used in leave-one-out (jack-knife) cross-validation for use in the caret::train function

阅读更多关于 Specifiying a selected range of data to be used in leave-one-out (jack-knife) cross-validation for use in the caret::train function

问题 This question builds on the question that I asked here: Creating data partitions over a selected range of data to be fed into caret::train function for cross-validation). The data I am working with looks like this: df <- data.frame(Effect = rep(seq(from = 0.05, to = 1, by = 0.05), each = 5), Time = rep(c(1:20,1:20), each = 5), Replicate = c(1:5)) Essentially what I would like to do is create custom partitions, like those generated by the caret::groupKFold function but for these folds to be

Preserve Order for Cross Validation in Weka

阅读更多关于 Preserve Order for Cross Validation in Weka

问题 I am using the Weka GUI for classifying sensor data. I have measures of 10 people, the data is sorted. So the first 10% correspond to participant 1, the second 10% to participant 2 etc. I would like to use 10 fold cross validation to build a model on 9 participants and test it on the remaining participant. In my case I believe I could accomplish this by simply not randomizing the data splits. How would I best go about doing this? 回答1: I don't know how to do this in the Explorer. In the

How to put KerasClassifier, Hyperopt and Sklearn cross-validation together

阅读更多关于 How to put KerasClassifier, Hyperopt and Sklearn cross-validation together

问题 I am performing a hyperparameter tuning optimization (hyperopt) tasks with sklearn on a Keras models. I am trying to optimize KerasClassifiers using the Sklearn cross-validation, Some code follows: def create_model(): model = Sequential() model.add( Dense(output_dim=params['units1'], input_dim=features_.shape[1], kernel_initializer="glorot_uniform")) model.add(Activation(params['activation'])) model.add(Dropout(params['dropout1'])) model.add(BatchNormalization()) ... model.compile(loss=

Sci-kit Learn PLS SVD and cross validation

阅读更多关于 Sci-kit Learn PLS SVD and cross validation

问题 The sklearn.cross_decomposition.PLSSVD class in Sci-kit learn appears to be failing when the response variable has a shape of (N,) instead of (N,1) , where N is the number of samples in the dataset. However, sklearn.cross_validation.cross_val_score fails when the response variable has a shape of (N,1) instead of (N,) . How can I use them together? A snippet of code: from sklearn.pipeline import Pipeline from sklearn.cross_decomposition import PLSSVD from sklearn.preprocessing import

ROC curve plot: 0.50 significant and cross-validation

阅读更多关于 ROC curve plot: 0.50 significant and cross-validation

问题 I have got two problems of using pROC package to plot the ROC curve. A. The Significance level or P-value is the probability that the observed sample Area under the ROC curve is found when in fact, the true (population) Area under the ROC curve is 0.5 (null hypothesis: Area = 0.5). If P is small (P<0.05) then it can be concluded that the Area under the ROC curve is significantly different from 0.5 and that therefore there is evidence that the laboratory test does have an ability to

Grouping rows from an R dataframe together when randomly assigning to training/testing datasets

阅读更多关于 Grouping rows from an R dataframe together when randomly assigning to training/testing datasets

问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 4 years ago . I have a dataframe that consists of blocks of X rows, each corresponding to a single individual (where X can be different for each individual). I'd like to randomly distribute these individuals into train, test and validation samples but so far I haven't been able to get the syntax correct to ensure that each of a user's X rows are always collected into the same subsample. For

thread.lock during custom parameter search class using Dask distributed

阅读更多关于 thread.lock during custom parameter search class using Dask distributed

问题 I wrote my own parameter search implementation mostly due to the fact that I don't need cross-validation of GridSearch and RandomizedSearch of scikit-learn . I use dask to deliver optimal distributed performance. Here is what I have: from scipy.stats import uniform class Params(object): def __init__(self,fixed,loc=0.0,scale=1.0): self.fixed=fixed self.sched=uniform(loc=loc,scale=scale) def _getsched(self,i,size): return self.sched.rvs(size=size,random_state=i) def param(self,i,size=None): tmp

Possible compatibility issue with Keras, TensorFlow and scikit (tf.global_variables())

阅读更多关于 Possible compatibility issue with Keras, TensorFlow and scikit (tf.global_variables())

问题 I'm trying to do a small test with my dataset on Keras Regressor (using TensorFlow), but I'm having a small issue. The error seems to be on the function cross_val_score from scikit. It starts on it and the last error message is: File "/usr/local/lib/python2.7/dist-packages/Keras-2.0.2-py2.7.egg/keras/backend/tensorflow_backend.py", line 298, in _initialize_variables variables = tf.global_variables() AttributeError: 'module' object has no attribute 'global_variables' My full code is basically

Avoiding overfitting with H2OGradientBoostingEstimator

阅读更多关于 Avoiding overfitting with H2OGradientBoostingEstimator

问题 It appears that the difference between cross-validation and training AUC ROC with H2OGradientBoostingEstimator remains high despite my best attempts using min_split_improvement. Using the same data with GradientBoostingClassifier(min_samples_split=10) results in no overfitting, but I can find no analogue of min_samples_split. Prepare Data from sklearn.datasets import make_classification X, y = make_classification(n_samples=10000, n_features=40, n_clusters_per_class=10, n_informative=25,