logistic-regression | 易学教程

Calculate residual deviance from scikit-learn logistic regression model

阅读更多关于 Calculate residual deviance from scikit-learn logistic regression model

问题 Is there any way to calculate residual deviance of a scikit-learn logistic regression model? This is a standard output from R model summaries, but I couldn't find it any of sklearn's documentation. 回答1: Actually, you can. Deviance is closely related to cross entropy, which is in sklearn.metrics.log_loss . Deviance is just 2*(loglikelihood_of_saturated_model - loglikelihood_of_fitted_model). Scikit learn can (without larger tweaks) only handle classification of individual instances, so that

Different coefficients: scikit-learn vs statsmodels (logistic regression)

阅读更多关于 Different coefficients: scikit-learn vs statsmodels (logistic regression)

问题 When running a logistic regression, the coefficients I get using statsmodels are correct (verified them with some course material). However, I am unable to get the same coefficients with sklearn. I've tried preprocessing the data to no avail. This is my code: Statsmodels: import statsmodels.api as sm X_const = sm.add_constant(X) model = sm.Logit(y, X_const) results = model.fit() print(results.summary()) The relevant output is: coef std err z P>|z| [0.025 0.975] -------------------------------

Why are the logistic regression results different between statsmodels and R?

阅读更多关于 Why are the logistic regression results different between statsmodels and R?

问题 I am trying to compare the logistic regression implementations in python's statsmodels and R. Python version: import statsmodels.api as sm import pandas as pd import pylab as pl import numpy as np df = pd.read_csv("http://www.ats.ucla.edu/stat/data/binary.csv") df.columns = list(df.columns)[:3] + ["prestige"] # df.hist() # pl.show() dummy_ranks = pd.get_dummies(df["prestige"], prefix="prestige") cols_to_keep = ["admit", "gre", "gpa"] data = df[cols_to_keep].join(dummy_ranks.ix[:, "prestige_2"

Calculating standard error of estimate, Wald-Chi Square statistic, p-value with logistic regression in Spark

阅读更多关于 Calculating standard error of estimate, Wald-Chi Square statistic, p-value with logistic regression in Spark

问题 I was trying to build Logistic regression model on a sample data. The output from the model we can get are the weights of features used to build the model. I could not find Spark API for standard error of estimate, Wald-Chi Square statistic, p-value etc. I am pasting my codes below as an example import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS import org.apache.spark.mllib.evaluation.{BinaryClassificationMetrics, MulticlassMetrics} import org.apache.spark.mllib.linalg

Logistic regression: how to try every combination of predictors in R?

阅读更多关于 Logistic regression: how to try every combination of predictors in R?

问题 This is a duplicate of https://stats.stackexchange.com/questions/293988/logistic-regression-how-to-try-every-combination-of-predictors. I want to perform a logistic regression: I have 1 dependent variable and ~10 predictors. I want to perform an exhaustive search trying every combination, such as changing order and adding/deleting predictors, etc. For example: y ~ x1 + x2 + x3 + x4 + x5 y ~ x2 + x1 + x3 + x4 + x5 y ~ x1 + x2 + x3 y ~ x5 + x1 + x2 + x3 + x4 y ~ x4 + x2 ...and so on.

Is ridge binomial regression available in Python?

阅读更多关于 Is ridge binomial regression available in Python?

问题 I am new to Python and I would like to fit a ridge binomial regression. I know that binomial regression is available at: http://statsmodels.sourceforge.net/devel/glm.html I also know that logistic regression with L2 penalty can be fitted with sklearn.linear_model. http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html As binomial is sum of Bernoulli, I could use scikit after transforming my binomial structured data into Bernoulli structure, by changing

Scikit F-score metric error

阅读更多关于 Scikit F-score metric error

问题 I am trying to predict a set of labels using Logistic Regression from SciKit. My data is really imbalanced (there are many more '0' than '1' labels) so I have to use the F1 score metric during the cross-validation step to "balance" the result. [Input] X_training, y_training, X_test, y_test = generate_datasets(df_X, df_y, 0.6) logistic = LogisticRegressionCV( Cs=50, cv=4, penalty='l2', fit_intercept=True, scoring='f1' ) logistic.fit(X_training, y_training) print('Predicted: %s' % str(logistic

Model runs with glm but not bigglm

阅读更多关于 Model runs with glm but not bigglm

问题 I was trying to run a logistic regression on 320,000 rows of data (6 variables). Stepwise model selection on a sample of the data (10000) gives a rather complex model with 5 interaction terms: Y~X1+ X2*X3+ X2*X4+ X2*X5+ X3*X6+ X4*X5 . The glm() function could fit this model with 10000 rows of data, but not with the whole dataset (320,000). Using bigglm to read data chunk by chunk from a SQL server resulted in an error, and I couldn't make sense of the results from traceback() : fit <- bigglm

Creating a sklearn.linear_model.LogisticRegression instance from existing coefficients

阅读更多关于 Creating a sklearn.linear_model.LogisticRegression instance from existing coefficients

问题 Can one create such an instance based on existing coefficients which were calculated say in a different implementation (e.g. Java)? I tried creating an instance then setting coef_ and intercept_ directly and it seems to work but I'm not sure if there's a down side here or if I might be breaking something. 回答1: Yes, it works okay: import numpy as np from scipy.stats import norm from sklearn.linear_model import LogisticRegression import json x = np.arange(10)[:, np.newaxis] y = np.array([0,0,0

Can MICE pool complete GLM output binary logistic regression?

阅读更多关于 Can MICE pool complete GLM output binary logistic regression?

问题 I am running a logistic regression with a binary outcome variable on data that has been multiply imputed using MICE. It seems straightforward to pool the coefficients of the glm model: imp=mice(nhanes2, print=F) imp$meth fit0=with(data=imp, glm(hyp~age, family = binomial)) fit1=with(data=imp, glm(hyp~age+chl, family = binomial)) summary(pool(fit1)) However, I can't figure out a way to pool other output generated by the glm. For instance, the glm function produces AIC, Null deviance and