logistic-regression

Cost Function and Gradient Seem to be Working, but scipy.optimize functions are not

元气小坏坏 提交于 2019-12-06 05:21:23
I'm working through my Matlab code for the Andrew NG Coursera course and turning it into python. I am working on non-regularized logistic regression and after writing my gradient and cost functions I needed something similar to fminunc and after some googling, I found a couple options. They are both returning the same results, but they do not match what is in Andrew NG's expected results code. Others seem to be getting this to work correctly, but I'm wondering why my specific code does not seem to return the desired result when using scipy.optimize functions, but does for the cost and gradient

Using cross validation and AUC-ROC for a logistic regression model in sklearn

亡梦爱人 提交于 2019-12-06 05:17:48
问题 I'm using the sklearn package to build a logistic regression model and then evaluate it. Specifically, I want to do so using cross validation, but can't figure out the right way to do so with the cross_val_score function. According to the documentation and some examples I saw, I need to pass the function the model, the features, the outcome, and a scoring method. However, the AUC doesn't need predictions, it needs probabilities, so it can try different threshold values and calculate the ROC

Why can't statsmodels reproduce my R logistic regression results?

笑着哭i 提交于 2019-12-06 03:52:19
问题 I'm confused about why my logistic regression models in R and statsmodels do not agree. If I prepare some data in R with # From https://courses.edx.org/c4x/MITx/15.071x/asset/census.csv library(caTools) # for sample.split census = read.csv("census.csv") set.seed(2000) split = sample.split(census$over50k, SplitRatio = 0.6) censusTrain = subset(census, split==TRUE) censusTest = subset(census, split==FALSE) and then run a logistic regression with CensusLog1 = glm(over50k ~., data=censusTrain,

glmnet error for logistic regression/binomial

时光毁灭记忆、已成空白 提交于 2019-12-06 03:33:15
I get this error when trying to fit glmnet() with family="binomial", for Logistic Regression fit: > data <- read.csv("DAFMM_HE16_matrix.csv", header=F) > x <- as.data.frame(data[,1:3]) > x <- model.matrix(~.,data=x) > y <- data[,4] > train=sample(1:dim(x)[1],287,replace=FALSE) > xTrain=x[train,] > xTest=x[-train,] > yTrain=y[train] > yTest=y[-train] > fit = glmnet(xTrain,yTrain,family="binomial") Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one multinomial or binomial class has 1 or 0 observations; not allowed Any help would be greatly appreciated - I've searched

Why the auc is so different from logistic regression of sklearn and R

这一生的挚爱 提交于 2019-12-06 02:58:37
I use a same dataset to train logistic regression model both in R and python sklearn. The dataset is unbalanced. And I find that the auc is quite different. This is the code of python: model_logistic = linear_model.LogisticRegression() #auc 0.623 model_logistic.fit(train_x, train_y) pred_logistic = model_logistic.predict(test_x) #mean:0.0235 var:0.023 print "logistic auc: ", sklearn.metrics.roc_auc_score(test_y,pred_logistic) This is the code of R: glm_fit <- glm(label ~ watch_cnt_7 + bid_cnt_7 + vi_cnt_itm_1 + ITEM_PRICE + add_to_cart_cnt_7 + offer_cnt_7 + dwell_dlta_4to2 + vi_cnt_itm_2 + asq

Using categorical data as features in sklean LogisticRegression

我的未来我决定 提交于 2019-12-05 17:49:21
问题 I'm trying to understand how to use categorical data as features in sklearn.linear_model 's LogisticRegression . I understand of course I need to encode it. What I don't understand is how to pass the encoded feature to the Logistic regression so it's processed as a categorical feature, and not interpreting the int value it got when encoding as a standard quantifiable feature. (Less important) Can somebody explain the difference between using preprocessing.LabelEncoder() , DictVectorizer

No zeros predicted from zeroinfl object in R?

北城以北 提交于 2019-12-05 17:43:06
I created a zero inflated negative binomial model and want to investigate how many of the zeros were partitioned out to sampling or structural zeros. How do I implement this in R. The example code on the zeroinfl page is not clear to me. data("bioChemists", package = "pscl") fm_zinb2 <- zeroinfl(art ~ . | ., data = bioChemists, dist = "negbin") table(round(predict(fm_zinb2, type="zero"))) > 0 1 > 891 24 table(round(bioChemists$art)) > 0 1 2 3 4 5 6 7 8 9 10 11 12 16 19 > 275 246 178 84 67 27 17 12 1 2 1 1 2 1 1 What is this telling me? When I do the same for my data I get a read out that just

Reproducing drc::plot.drc with ggplot2

我与影子孤独终老i 提交于 2019-12-05 15:52:40
I want to reproduce the following drc::plot.drc graphs with ggplot2 . df1 <- structure(list(TempV = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 10L, 10L, 10L, 10L, 10L, 10L

Vowpal Wabbit Logistic Regression

纵然是瞬间 提交于 2019-12-05 05:53:04
I am performing logistic regression using Vowpal Wabbit on a dataset with 25 features and 48 million instances. I have a question on current predict values. Should it be within 0 or 1. average since example example current current current loss last counter weight label predict features 0.693147 0.693147 1 1.0 -1.0000 0.0000 24 0.419189 0.145231 2 2.0 -1.0000 -1.8559 24 0.235457 0.051725 4 4.0 -1.0000 -2.7588 23 6.371911 12.508365 8 8.0 -1.0000 -3.7784 24 3.485084 0.598258 16 16.0 -1.0000 -2.2767 24 1.765249 0.045413 32 32.0 -1.0000 -2.8924 24 1.017911 0.270573 64 64.0 -1.0000 -3.0438 25 0

Multi-Class Logistic Regression in SciKit Learn

不想你离开。 提交于 2019-12-05 05:41:24
问题 I am having trouble with the proper call of Scikit's Logistic Regression for the multi-class case. I am using the lbgfs solver, and I do have the multi_class parameter set to multinomial. It is unclear to me how to pass the true class labels in fitting the model. I had assumed that it was similar/same as for the random forest classifier multi-class, where you pass [n_samples, m_classes] dataframe. However, in doing this, I get an error that the data is of a bad shape. ValueError: bad input