cross-validation | 易学教程

Why when I use GridSearchCV with roc_auc scoring, the score is different for grid_search.score(X,y) and roc_auc_score(y, y_predict)?

阅读更多关于 Why when I use GridSearchCV with roc_auc scoring, the score is different for grid_search.score(X,y) and roc_auc_score(y, y_predict)?

问题 I am using stratified 10-fold cross validation to find model that predicts y (binary outcome) from X (X has 34 labels) with the highest auc. I set the GridSearchCV: log_reg = LogisticRegression() parameter_grid = {'penalty' : ["l1", "l2"],'C': np.arange(0.1, 3, 0.1),} cross_validation = StratifiedKFold(n_splits=10,shuffle=True,random_state=100) grid_search = GridSearchCV(log_reg, param_grid = parameter_grid,scoring='roc_auc', cv = cross_validation) And then do the cross-validation: grid

Calculate cross validation for Generalized Linear Model in Matlab

阅读更多关于 Calculate cross validation for Generalized Linear Model in Matlab

I am doing a regression using Generalized Linear Model.I am caught offguard using the crossVal function. My implementation so far; x = 'Some dataset, containing the input and the output' X = x(:,1:7); Y = x(:,8); cvpart = cvpartition(Y,'holdout',0.3); Xtrain = X(training(cvpart),:); Ytrain = Y(training(cvpart),:); Xtest = X(test(cvpart),:); Ytest = Y(test(cvpart),:); mdl = GeneralizedLinearModel.fit(Xtrain,Ytrain,'linear','distr','poisson'); Ypred = predict(mdl,Xtest); res = (Ypred - Ytest); RMSE_test = sqrt(mean(res.^2)); The code below is for calculating cross validation for mulitple

Is cv.glmnet overfitting the the data by using the full lambda sequence?

阅读更多关于 Is cv.glmnet overfitting the the data by using the full lambda sequence?

cv.glmnet has been used by most research papers and companies. While building a similar function like cv.glmnet for glmnet.cr (a similar package that implements the lasso for continuation ratio ordinal regression) I came across this problem in cv.glmnet . `cv.glmnet` first fits the model: glmnet.object = glmnet(x, y, weights = weights, offset = offset, lambda = lambda, ...) After the glmnet object is created with the complete data, the next step goes as follows: The lambda from the complete model fitted is extracted lambda = glmnet.object$lambda Now they make sure number of folds is more than

Difference between glmnet() and cv.glmnet() in R?

阅读更多关于 Difference between glmnet() and cv.glmnet() in R?

问题 I'm working on a project that would show the potential influence a group of events have on an outcome. I'm using the glmnet() package, specifically using the Poisson feature. Here's my code: # de <- data imported from sql connection x <- model.matrix(~.,data = de[,2:7]) y <- (de[,1]) reg <- cv.glmnet(x,y, family = "poisson", alpha = 1) reg1 <- glmnet(x,y, family = "poisson", alpha = 1) **Co <- coef(?reg or reg1?,s=???)** summ <- summary(Co) c <- data.frame(Name= rownames(Co)[summ$i], Lambda=

Using statsmodel estimations with scikit-learn cross validation, is it possible?

阅读更多关于 Using statsmodel estimations with scikit-learn cross validation, is it possible?

问题 I posted this question to Cross Validated forum and later realized may be this would find appropriate audience in stackoverlfow instead. I am looking for a way I can use the fit object (result) ontained from python statsmodel to feed into cross_val_score of scikit-learn cross_validation method? The attached link suggests that it may be possible but I have not succeeded. I am getting the following error estimator should a be an estimator implementing 'fit' method statsmodels.discrete.discrete

Scikit - Combining scale and grid search

阅读更多关于 Scikit - Combining scale and grid search

I am new to scikit, and have 2 slight issues to combine a data scale and grid search. Efficient scaler Considering a cross validation using Kfolds, I would like that each time we train the model on the K-1 folds, the data scaler (using preprocessing.StandardScaler() for instance) is fit only on the K-1 folds and then apply to the remaining fold. My impression is that the following code, will fit the scaler on the entire dataset, and therefore I would like to modify it to behave as described previsouly: classifier = svm.SVC(C=1) clf = make_pipeline(preprocessing.StandardScaler(), classifier)

Best parameters solved by Hyperopt is unsuitable

阅读更多关于 Best parameters solved by Hyperopt is unsuitable

I used hyperopt to search best parameters for SVM classifier, but Hyperopt says best 'kernel' is '0'. {'kernel': '0'} is obviously unsuitable. Does anyone know whether it's caused by my fault or a bag of hyperopt ? Code is below. from hyperopt import fmin, tpe, hp, rand import numpy as np from sklearn.metrics import accuracy_score from sklearn import svm from sklearn.cross_validation import StratifiedKFold parameter_space_svc = { 'C':hp.loguniform("C", np.log(1), np.log(100)), 'kernel':hp.choice('kernel',['rbf','poly']), 'gamma': hp.loguniform("gamma", np.log(0.001), np.log(0.1)), } from

How to implement SMOTE in cross validation and GridSearchCV

阅读更多关于 How to implement SMOTE in cross validation and GridSearchCV

问题 I'm relatively new to Python. Can you help me improve my implementation of SMOTE to a proper pipeline? What I want is to apply the over and under sampling on the training set of every k-fold iteration so that the model is trained on a balanced data set and evaluated on the imbalanced left out piece. The problem is that when I do that I cannot use the familiar sklearn interface for evaluation and grid search. Is it possible to make something similar to model_selection.RandomizedSearchCV . My

Caret Package: Stratified Cross Validation in Train Function

阅读更多关于 Caret Package: Stratified Cross Validation in Train Function

问题 Is there a way to perform stratified cross validation when using the train function to fit a model to a large imbalanced data set? I know straight forward k fold cross validation is possible but my categories are highly unbalanced. I've seen discussion about this topic but no real definitive answer. Thanks in advance. 回答1: There is a parameter called 'index' which can let user specified the index to do cross validation. folds <- 4 cvIndex <- createFolds(factor(training$Y), folds, returnTrain

Using statsmodel estimations with scikit-learn cross validation, is it possible?

阅读更多关于 Using statsmodel estimations with scikit-learn cross validation, is it possible?

I posted this question to Cross Validated forum and later realized may be this would find appropriate audience in stackoverlfow instead. I am looking for a way I can use the fit object (result) ontained from python statsmodel to feed into cross_val_score of scikit-learn cross_validation method? The attached link suggests that it may be possible but I have not succeeded. I am getting the following error estimator should a be an estimator implementing 'fit' method statsmodels.discrete.discrete_model.BinaryResultsWrapper object at 0x7fa6e801c590 was passed Refer this link Indeed, you cannot use