cross-validation

Why when I use GridSearchCV with roc_auc scoring, the score is different for grid_search.score(X,y) and roc_auc_score(y, y_predict)?

我的梦境 提交于 2019-12-04 01:30:46
问题 I am using stratified 10-fold cross validation to find model that predicts y (binary outcome) from X (X has 34 labels) with the highest auc. I set the GridSearchCV: log_reg = LogisticRegression() parameter_grid = {'penalty' : ["l1", "l2"],'C': np.arange(0.1, 3, 0.1),} cross_validation = StratifiedKFold(n_splits=10,shuffle=True,random_state=100) grid_search = GridSearchCV(log_reg, param_grid = parameter_grid,scoring='roc_auc', cv = cross_validation) And then do the cross-validation: grid

Calculate cross validation for Generalized Linear Model in Matlab

流过昼夜 提交于 2019-12-03 20:50:51
I am doing a regression using Generalized Linear Model.I am caught offguard using the crossVal function. My implementation so far; x = 'Some dataset, containing the input and the output' X = x(:,1:7); Y = x(:,8); cvpart = cvpartition(Y,'holdout',0.3); Xtrain = X(training(cvpart),:); Ytrain = Y(training(cvpart),:); Xtest = X(test(cvpart),:); Ytest = Y(test(cvpart),:); mdl = GeneralizedLinearModel.fit(Xtrain,Ytrain,'linear','distr','poisson'); Ypred = predict(mdl,Xtest); res = (Ypred - Ytest); RMSE_test = sqrt(mean(res.^2)); The code below is for calculating cross validation for mulitple

Is cv.glmnet overfitting the the data by using the full lambda sequence?

拈花ヽ惹草 提交于 2019-12-03 20:15:08
cv.glmnet has been used by most research papers and companies. While building a similar function like cv.glmnet for glmnet.cr (a similar package that implements the lasso for continuation ratio ordinal regression) I came across this problem in cv.glmnet . `cv.glmnet` first fits the model: glmnet.object = glmnet(x, y, weights = weights, offset = offset, lambda = lambda, ...) After the glmnet object is created with the complete data, the next step goes as follows: The lambda from the complete model fitted is extracted lambda = glmnet.object$lambda Now they make sure number of folds is more than

Difference between glmnet() and cv.glmnet() in R?

和自甴很熟 提交于 2019-12-03 16:56:26
问题 I'm working on a project that would show the potential influence a group of events have on an outcome. I'm using the glmnet() package, specifically using the Poisson feature. Here's my code: # de <- data imported from sql connection x <- model.matrix(~.,data = de[,2:7]) y <- (de[,1]) reg <- cv.glmnet(x,y, family = "poisson", alpha = 1) reg1 <- glmnet(x,y, family = "poisson", alpha = 1) **Co <- coef(?reg or reg1?,s=???)** summ <- summary(Co) c <- data.frame(Name= rownames(Co)[summ$i], Lambda=

Using statsmodel estimations with scikit-learn cross validation, is it possible?

≡放荡痞女 提交于 2019-12-03 16:32:34
问题 I posted this question to Cross Validated forum and later realized may be this would find appropriate audience in stackoverlfow instead. I am looking for a way I can use the fit object (result) ontained from python statsmodel to feed into cross_val_score of scikit-learn cross_validation method? The attached link suggests that it may be possible but I have not succeeded. I am getting the following error estimator should a be an estimator implementing 'fit' method statsmodels.discrete.discrete

Scikit - Combining scale and grid search

一个人想着一个人 提交于 2019-12-03 16:21:55
I am new to scikit, and have 2 slight issues to combine a data scale and grid search. Efficient scaler Considering a cross validation using Kfolds, I would like that each time we train the model on the K-1 folds, the data scaler (using preprocessing.StandardScaler() for instance) is fit only on the K-1 folds and then apply to the remaining fold. My impression is that the following code, will fit the scaler on the entire dataset, and therefore I would like to modify it to behave as described previsouly: classifier = svm.SVC(C=1) clf = make_pipeline(preprocessing.StandardScaler(), classifier)

Best parameters solved by Hyperopt is unsuitable

好久不见. 提交于 2019-12-03 15:58:53
I used hyperopt to search best parameters for SVM classifier, but Hyperopt says best 'kernel' is '0'. {'kernel': '0'} is obviously unsuitable. Does anyone know whether it's caused by my fault or a bag of hyperopt ? Code is below. from hyperopt import fmin, tpe, hp, rand import numpy as np from sklearn.metrics import accuracy_score from sklearn import svm from sklearn.cross_validation import StratifiedKFold parameter_space_svc = { 'C':hp.loguniform("C", np.log(1), np.log(100)), 'kernel':hp.choice('kernel',['rbf','poly']), 'gamma': hp.loguniform("gamma", np.log(0.001), np.log(0.1)), } from

How to implement SMOTE in cross validation and GridSearchCV

不羁的心 提交于 2019-12-03 12:45:35
问题 I'm relatively new to Python. Can you help me improve my implementation of SMOTE to a proper pipeline? What I want is to apply the over and under sampling on the training set of every k-fold iteration so that the model is trained on a balanced data set and evaluated on the imbalanced left out piece. The problem is that when I do that I cannot use the familiar sklearn interface for evaluation and grid search. Is it possible to make something similar to model_selection.RandomizedSearchCV . My

Caret Package: Stratified Cross Validation in Train Function

回眸只為那壹抹淺笑 提交于 2019-12-03 12:34:39
问题 Is there a way to perform stratified cross validation when using the train function to fit a model to a large imbalanced data set? I know straight forward k fold cross validation is possible but my categories are highly unbalanced. I've seen discussion about this topic but no real definitive answer. Thanks in advance. 回答1: There is a parameter called 'index' which can let user specified the index to do cross validation. folds <- 4 cvIndex <- createFolds(factor(training$Y), folds, returnTrain

Using statsmodel estimations with scikit-learn cross validation, is it possible?

亡梦爱人 提交于 2019-12-03 12:22:01
I posted this question to Cross Validated forum and later realized may be this would find appropriate audience in stackoverlfow instead. I am looking for a way I can use the fit object (result) ontained from python statsmodel to feed into cross_val_score of scikit-learn cross_validation method? The attached link suggests that it may be possible but I have not succeeded. I am getting the following error estimator should a be an estimator implementing 'fit' method statsmodels.discrete.discrete_model.BinaryResultsWrapper object at 0x7fa6e801c590 was passed Refer this link Indeed, you cannot use