logistic-regression | 易学教程

Cost Function and Gradient Seem to be Working, but scipy.optimize functions are not

阅读更多关于 Cost Function and Gradient Seem to be Working, but scipy.optimize functions are not

I'm working through my Matlab code for the Andrew NG Coursera course and turning it into python. I am working on non-regularized logistic regression and after writing my gradient and cost functions I needed something similar to fminunc and after some googling, I found a couple options. They are both returning the same results, but they do not match what is in Andrew NG's expected results code. Others seem to be getting this to work correctly, but I'm wondering why my specific code does not seem to return the desired result when using scipy.optimize functions, but does for the cost and gradient

Using cross validation and AUC-ROC for a logistic regression model in sklearn

阅读更多关于 Using cross validation and AUC-ROC for a logistic regression model in sklearn

问题 I'm using the sklearn package to build a logistic regression model and then evaluate it. Specifically, I want to do so using cross validation, but can't figure out the right way to do so with the cross_val_score function. According to the documentation and some examples I saw, I need to pass the function the model, the features, the outcome, and a scoring method. However, the AUC doesn't need predictions, it needs probabilities, so it can try different threshold values and calculate the ROC

Why can't statsmodels reproduce my R logistic regression results?

阅读更多关于 Why can't statsmodels reproduce my R logistic regression results?

问题 I'm confused about why my logistic regression models in R and statsmodels do not agree. If I prepare some data in R with # From https://courses.edx.org/c4x/MITx/15.071x/asset/census.csv library(caTools) # for sample.split census = read.csv("census.csv") set.seed(2000) split = sample.split(census$over50k, SplitRatio = 0.6) censusTrain = subset(census, split==TRUE) censusTest = subset(census, split==FALSE) and then run a logistic regression with CensusLog1 = glm(over50k ~., data=censusTrain,

glmnet error for logistic regression/binomial

阅读更多关于 glmnet error for logistic regression/binomial

I get this error when trying to fit glmnet() with family="binomial", for Logistic Regression fit: > data <- read.csv("DAFMM_HE16_matrix.csv", header=F) > x <- as.data.frame(data[,1:3]) > x <- model.matrix(~.,data=x) > y <- data[,4] > train=sample(1:dim(x)[1],287,replace=FALSE) > xTrain=x[train,] > xTest=x[-train,] > yTrain=y[train] > yTest=y[-train] > fit = glmnet(xTrain,yTrain,family="binomial") Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one multinomial or binomial class has 1 or 0 observations; not allowed Any help would be greatly appreciated - I've searched

Why the auc is so different from logistic regression of sklearn and R

阅读更多关于 Why the auc is so different from logistic regression of sklearn and R

I use a same dataset to train logistic regression model both in R and python sklearn. The dataset is unbalanced. And I find that the auc is quite different. This is the code of python: model_logistic = linear_model.LogisticRegression() #auc 0.623 model_logistic.fit(train_x, train_y) pred_logistic = model_logistic.predict(test_x) #mean:0.0235 var:0.023 print "logistic auc: ", sklearn.metrics.roc_auc_score(test_y,pred_logistic) This is the code of R: glm_fit <- glm(label ~ watch_cnt_7 + bid_cnt_7 + vi_cnt_itm_1 + ITEM_PRICE + add_to_cart_cnt_7 + offer_cnt_7 + dwell_dlta_4to2 + vi_cnt_itm_2 + asq

Using categorical data as features in sklean LogisticRegression

阅读更多关于 Using categorical data as features in sklean LogisticRegression

问题 I'm trying to understand how to use categorical data as features in sklearn.linear_model 's LogisticRegression . I understand of course I need to encode it. What I don't understand is how to pass the encoded feature to the Logistic regression so it's processed as a categorical feature, and not interpreting the int value it got when encoding as a standard quantifiable feature. (Less important) Can somebody explain the difference between using preprocessing.LabelEncoder() , DictVectorizer

No zeros predicted from zeroinfl object in R?

阅读更多关于 No zeros predicted from zeroinfl object in R?

I created a zero inflated negative binomial model and want to investigate how many of the zeros were partitioned out to sampling or structural zeros. How do I implement this in R. The example code on the zeroinfl page is not clear to me. data("bioChemists", package = "pscl") fm_zinb2 <- zeroinfl(art ~ . | ., data = bioChemists, dist = "negbin") table(round(predict(fm_zinb2, type="zero"))) > 0 1 > 891 24 table(round(bioChemists$art)) > 0 1 2 3 4 5 6 7 8 9 10 11 12 16 19 > 275 246 178 84 67 27 17 12 1 2 1 1 2 1 1 What is this telling me? When I do the same for my data I get a read out that just

Reproducing drc::plot.drc with ggplot2

阅读更多关于 Reproducing drc::plot.drc with ggplot2

I want to reproduce the following drc::plot.drc graphs with ggplot2 . df1 <- structure(list(TempV = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 10L, 10L, 10L, 10L, 10L, 10L

Vowpal Wabbit Logistic Regression

阅读更多关于 Vowpal Wabbit Logistic Regression

I am performing logistic regression using Vowpal Wabbit on a dataset with 25 features and 48 million instances. I have a question on current predict values. Should it be within 0 or 1. average since example example current current current loss last counter weight label predict features 0.693147 0.693147 1 1.0 -1.0000 0.0000 24 0.419189 0.145231 2 2.0 -1.0000 -1.8559 24 0.235457 0.051725 4 4.0 -1.0000 -2.7588 23 6.371911 12.508365 8 8.0 -1.0000 -3.7784 24 3.485084 0.598258 16 16.0 -1.0000 -2.2767 24 1.765249 0.045413 32 32.0 -1.0000 -2.8924 24 1.017911 0.270573 64 64.0 -1.0000 -3.0438 25 0

Multi-Class Logistic Regression in SciKit Learn

阅读更多关于 Multi-Class Logistic Regression in SciKit Learn

问题 I am having trouble with the proper call of Scikit's Logistic Regression for the multi-class case. I am using the lbgfs solver, and I do have the multi_class parameter set to multinomial. It is unclear to me how to pass the true class labels in fitting the model. I had assumed that it was similar/same as for the random forest classifier multi-class, where you pass [n_samples, m_classes] dataframe. However, in doing this, I get an error that the data is of a bad shape. ValueError: bad input