cross-validation | 易学教程

Selecting SVM parameters using cross validation and F1-scores

阅读更多关于 Selecting SVM parameters using cross validation and F1-scores

问题 I need to keep track of the F1-scores while tuning C & Sigma in SVM, For example the following code keeps track of the Accuracy, I need to change it to F1-Score but I was not able to do that……. %# read some training data [labels,data] = libsvmread('./heart_scale'); %# grid of parameters folds = 5; [C,gamma] = meshgrid(-5:2:15, -15:2:3); %# grid search, and cross-validation cv_acc = zeros(numel(C),1); for i=1:numel(C) cv_acc(i) = svmtrain(labels, data, ... sprintf('-c %f -g %f -v %d', 2^C(i),

How to specify a validation holdout set to caret

阅读更多关于 How to specify a validation holdout set to caret

I really like using caret for at least the early stages of modeling, especially for it's really easy to use resampling methods. However, I'm working on a model where the training set has a fair number of cases added via semi-supervised self-training and my cross-validation results are really skewed because of it. My solution to this is using a validation set to measure model performance but I can't see a way use a validation set directly within caret - am I missing something or this just not supported? I know that I can write my own wrappers to do what caret would normally do for m, but it

how to specify train and test indices for xgb.cv in R package XGBoost

阅读更多关于 how to specify train and test indices for xgb.cv in R package XGBoost

问题 I recently found out about the folds parameter in xgb.cv , which allows one to specify the indices of the validation set. The helper function xgb.cv.mknfold is then invoked within xgb.cv , which then takes the remaining indices for each fold to be the indices of the training set for the respective fold. Question: Can I specify both the training and validation indices via any interfaces in the xgboost interface? My primary motivation is performing time-series cross validation, and I do not

Using sklearn cross_val_score and kfolds to fit and help predict model

阅读更多关于 Using sklearn cross_val_score and kfolds to fit and help predict model

问题 I'm trying to understand using kfolds cross validation from the sklearn python module. I understand the basic flow: instantiate a model e.g. model = LogisticRegression() fitting the model e.g. model.fit(xtrain, ytrain) predicting e.g. model.predict(ytest) use e.g. cross val score to test the fitted model accuracy. Where i'm confused is using sklearn kfolds with cross val score. As I understand it the cross_val_score function will fit the model and predict on the kfolds giving you an accuracy

Evaluating Logistic regression with cross validation

阅读更多关于 Evaluating Logistic regression with cross validation

问题 I would like to use cross validation to test/train my dataset and evaluate the performance of the logistic regression model on the entire dataset and not only on the test set (e.g. 25%). These concepts are totally new to me and am not very sure if am doing it right. I would be grateful if anyone could advise me on the right steps to take where I have gone wrong. Part of my code is shown below. Also, how can I plot ROCs for "y2" and "y3" on the same graph with the current one? Thank you import

Using sklearn cross_val_score and kfolds to fit and help predict model

阅读更多关于 Using sklearn cross_val_score and kfolds to fit and help predict model

I'm trying to understand using kfolds cross validation from the sklearn python module. I understand the basic flow: instantiate a model e.g. model = LogisticRegression() fitting the model e.g. model.fit(xtrain, ytrain) predicting e.g. model.predict(ytest) use e.g. cross val score to test the fitted model accuracy. Where i'm confused is using sklearn kfolds with cross val score. As I understand it the cross_val_score function will fit the model and predict on the kfolds giving you an accuracy score for each fold. e.g. using code like this: kf = KFold(n=data.shape[0], n_folds=5, shuffle=True,

R: Cross validation on a dataset with factors

阅读更多关于 R: Cross validation on a dataset with factors

Often, I want to run a cross validation on a dataset which contains some factor variables and after running for a while, the cross validation routine fails with the error: factor x has new levels Y . For example, using package boot : library(boot) d <- data.frame(x=c('A', 'A', 'B', 'B', 'C', 'C'), y=c(1, 2, 3, 4, 5, 6)) m <- glm(y ~ x, data=d) m.cv <- cv.glm(d, m, K=2) # Sometimes succeeds m.cv <- cv.glm(d, m, K=2) # Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : # factor x has new levels B Update : This is a toy example. The same problem occurs

Evaluating Logistic regression with cross validation

阅读更多关于 Evaluating Logistic regression with cross validation

I would like to use cross validation to test/train my dataset and evaluate the performance of the logistic regression model on the entire dataset and not only on the test set (e.g. 25%). These concepts are totally new to me and am not very sure if am doing it right. I would be grateful if anyone could advise me on the right steps to take where I have gone wrong. Part of my code is shown below. Also, how can I plot ROCs for "y2" and "y3" on the same graph with the current one? Thank you import pandas as pd Data=pd.read_csv ('C:\\Dataset.csv',index_col='SNo') feature_cols=['A','B','C','D','E'] X

How to specify a validation holdout set to caret

阅读更多关于 How to specify a validation holdout set to caret

问题 I really like using caret for at least the early stages of modeling, especially for it's really easy to use resampling methods. However, I'm working on a model where the training set has a fair number of cases added via semi-supervised self-training and my cross-validation results are really skewed because of it. My solution to this is using a validation set to measure model performance but I can't see a way use a validation set directly within caret - am I missing something or this just not

Cross-validation for Sklearn 0.20+?

阅读更多关于 Cross-validation for Sklearn 0.20+?

I am trying to do cross validation and I am running into an error that says: 'Found input variables with inconsistent numbers of samples: [18, 1]' I am using different columns in a pandas data frame (df) as the features, with the last column as the label. This is derived from the machine learning repository for UC Irvine. When importing the cross-validation package that I have used in the past, I am getting an error that it may have depreciated. I am going to be running a decision tree, SVM, and K-NN. My code is as such: feature = [df['age'], df['job'], df['marital'], df['education'], df[