cross-validation | 易学教程

Creating folds for k-fold CV in R using Caret

阅读更多关于 Creating folds for k-fold CV in R using Caret

问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 5 years ago . I'm trying to make a k-fold CV for several classification methods/hiperparameters using the data available at http://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data. This set is made of 208 rows, each with 60 attributes. I'm reading it into a data.frame using the read.table function. The next step is to split my data into

return coefficients from Pipeline object in sklearn

阅读更多关于 return coefficients from Pipeline object in sklearn

I've fit a Pipeline object with RandomizedSearchCV pipe_sgd = Pipeline([('scl', StandardScaler()), ('clf', SGDClassifier(n_jobs=-1))]) param_dist_sgd = {'clf__loss': ['log'], 'clf__penalty': [None, 'l1', 'l2', 'elasticnet'], 'clf__alpha': np.linspace(0.15, 0.35), 'clf__n_iter': [3, 5, 7]} sgd_randomized_pipe = RandomizedSearchCV(estimator = pipe_sgd, param_distributions=param_dist_sgd, cv=3, n_iter=30, n_jobs=-1) sgd_randomized_pipe.fit(X_train, y_train) I want to access the coef_ attribute of the best_estimator_ but I'm unable to do that. I've tried accessing coef_ with the code below. sgd

how use grid search with fit generator in keras

阅读更多关于 how use grid search with fit generator in keras

问题 i want to grid search the parameter of the model with fit_generator as input in keras i find below code in stack overflow and change it 1- but i don't understand how give the fit_generator or flow_from_directory to fit function(last line in the code) 2- how can add early stop? thanks from __future__ import print_function import keras from keras.datasets import mnist from keras.models import Sequential from keras.layers import Dense, Dropout, Activation, Flatten from keras.layers import Conv2D

CARET. Relationship between data splitting and trainControl

阅读更多关于 CARET. Relationship between data splitting and trainControl

I have carefully read the CARET documentation at: http://caret.r-forge.r-project.org/training.html , the vignettes, and everything is quite clear (the examples on the website help a lot!), but I am still a confused about the relationship between two arguments to trainControl : method index and the interplay between trainControl and the data splitting functions in caret (e.g. createDataPartition , createResample , createFolds and createMultiFolds ) To better frame my questions, let me use the following example from the documentation: data(BloodBrain) set.seed(1) tmp <- createDataPartition

Deprecation warnings from sklearn

阅读更多关于 Deprecation warnings from sklearn

I am using cross_validation from sklearn, from sklearn.cross_validation import train_test_split I get the below warning: cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Problem: The deprecation warning means that the module is deprecated, i.e. no longer supported. You are using a version for which sklearn.cross_validation is not a module any longer. Solution: from sklearn.model_selection import train_test_split C/O: This post. To avoid this, you just need

Custom cross validation split sklearn

阅读更多关于 Custom cross validation split sklearn

I am trying to split a dataset for cross validation and GridSearch in sklearn. I want to define my own split but GridSearch only takes the built in cross-validation methods. However, I can't use the built in cross validation method because I need certain groups of examples to be in the same fold. So, if I have examples: [A1, A2, A3, A4, A5, B1, B2, B3, C1, C2, C3, C4, .... , Z1, Z2, Z3] I want to perform cross validation such that examples from each group [A,B,C...] only exist in one fold. ie K1 contains [D,E,G,J,K...], K2 contains [A,C,L,M,...], K3 contains [B,F,I,...] etc This type of thing

k-fold cross validation - how to get the prediction automatically?

阅读更多关于 k-fold cross validation - how to get the prediction automatically?

问题 This may be a silly question but I just can't find a package to do that...I know I can write some codes to get what I want but it would be nice to have a function to do it automatically! So basically I want to do a k-fold cross-validation for a glm model. I want to automatically get the predictions of each validation set and the actual value too. So if I am doing a 10-fold CV, I want a function to return the 10 validation sets with the actual responses and predictions all together. Thank you

Grid Search and Early Stopping Using Cross Validation with XGBoost in SciKit-Learn

阅读更多关于 Grid Search and Early Stopping Using Cross Validation with XGBoost in SciKit-Learn

I am fairly new to sci-kit learn and have been trying to hyper-paramater tune XGBoost. My aim is to use early stopping and grid search to tune the model parameters and use early stopping to control the number of trees and avoid overfitting. As I am using cross validation for the grid search, I was hoping to also use cross-validation in the early stopping criteria. The code I have so far looks like this: import numpy as np import pandas as pd from sklearn import model_selection import xgboost as xgb #Import training and test data train = pd.read_csv("train.csv").fillna(value=-999.0) test = pd

Does TensorFlow have cross validation implemented for its users?

阅读更多关于 Does TensorFlow have cross validation implemented for its users?

问题 I was thinking of trying to choose hyper parameters (like regularization for example) using cross validation or maybe train multiple initializations of a models and then choose the model with highest cross validation accuracy. Implementing k-fold or CV is simple but tedious/annoying (specially if I am trying to train different models in different CPU's, GPU's or even different computers etc). I would expect a library like TensorFlow to have something like this implemented for its user so that

Difference between glmnet() and cv.glmnet() in R?

阅读更多关于 Difference between glmnet() and cv.glmnet() in R?

I'm working on a project that would show the potential influence a group of events have on an outcome. I'm using the glmnet() package, specifically using the Poisson feature. Here's my code: # de <- data imported from sql connection x <- model.matrix(~.,data = de[,2:7]) y <- (de[,1]) reg <- cv.glmnet(x,y, family = "poisson", alpha = 1) reg1 <- glmnet(x,y, family = "poisson", alpha = 1) **Co <- coef(?reg or reg1?,s=???)** summ <- summary(Co) c <- data.frame(Name= rownames(Co)[summ$i], Lambda= summ$x) c2 <- c[with(c, order(-Lambda)), ] The beginning imports a large amount of data from my