logistic-regression | 易学教程

R: Calculate and interpret odds ratio in logistic regression

阅读更多关于 R: Calculate and interpret odds ratio in logistic regression

问题 I am having trouble interpreting the results of a logistic regression. My outcome variable is Decision and is binary (0 or 1, not take or take a product, respectively). My predictor variable is Thoughts and is continuous, can be positive or negative, and is rounded up to the 2nd decimal point. I want to know how the probability of taking the product changes as Thoughts changes. The logistic regression equation is: glm(Decision ~ Thoughts, family = binomial, data = data) According to this

Interpreting coefficients from Logistic Regression from R

阅读更多关于 Interpreting coefficients from Logistic Regression from R

问题 All, I ran a logistic Regression on a set of variables both categorical and continuous with a binary event as dependent variable. Now post modelling, I observe a set of categorical variables showing negative sign which I presume is to understand that if that categorical variable occurs high number of times then the probability of the dependent variable occurring is low. But when I see the % of occurrence of that independent variable I see the reverse trend happening. hence the result seems to

Does Quasi Separation matter in R binomial GLM?

阅读更多关于 Does Quasi Separation matter in R binomial GLM?

问题 I am learning how the quasi-separation affects R binomial GLM. And I start to think that it does not matter in some circumstance . In my understanding, we say that the data has quasi separation when some linear combination of factor levels can completely identify failure/non-failure. So I created an artificial dataset with a quasi separation in R as: fail <- c(100,100,100,100) nofail <- c(100,100,0,100) x1 <- c(1,0,1,0) x2 <- c(0,0,1,1) data <- data.frame(fail,nofail,x1,x2) rownames(data) <-

R: Calculate and interpret odds ratio in logistic regression

阅读更多关于 R: Calculate and interpret odds ratio in logistic regression

I am having trouble interpreting the results of a logistic regression. My outcome variable is Decision and is binary (0 or 1, not take or take a product, respectively). My predictor variable is Thoughts and is continuous, can be positive or negative, and is rounded up to the 2nd decimal point. I want to know how the probability of taking the product changes as Thoughts changes. The logistic regression equation is: glm(Decision ~ Thoughts, family = binomial, data = data) According to this model, Thought s has a significant impact on probability of Decision (b = .72, p = .02). To determine the

R - Getting Column of Dataframe from String [duplicate]

阅读更多关于 R - Getting Column of Dataframe from String [duplicate]

This question already has an answer here: Dynamically select data frame columns using $ and a vector of column names 8 answers I am trying to create a function that allows the conversion of selected columns of a data frame to categorical data type (factor) before running a regression analysis. Question is how do I slice a particular column from a data frame using a string (character). Example: strColumnNames <- "Admit,Rank" strDelimiter <- "," strSplittedColumnNames <- strsplit(strColumnNames, strDelimiter) for( strColName in strSplittedColumnNames[[1]] ){ dfData$as.name(strColName) <- factor

Does Quasi Separation matter in R binomial GLM?

阅读更多关于 Does Quasi Separation matter in R binomial GLM?

I am learning how the quasi-separation affects R binomial GLM. And I start to think that it does not matter in some circumstance . In my understanding, we say that the data has quasi separation when some linear combination of factor levels can completely identify failure/non-failure. So I created an artificial dataset with a quasi separation in R as: fail <- c(100,100,100,100) nofail <- c(100,100,0,100) x1 <- c(1,0,1,0) x2 <- c(0,0,1,1) data <- data.frame(fail,nofail,x1,x2) rownames(data) <- paste("obs",1:4) Then when x1=1 and x2=1 (obs 3) the data always doesn't fail. In this data, my

Multiple Logistic Regression with Interaction between Quantitative and Qualitative Explanatory Variables

阅读更多关于 Multiple Logistic Regression with Interaction between Quantitative and Qualitative Explanatory Variables

问题 As a follow up to this question, I fitted the Multiple Logistic Regression with Interaction between Quantitative and Qualitative Explanatory Variables. MWE is given below: Type <- rep(x=LETTERS[1:3], each=5) Conc <- rep(x=seq(from=0, to=40, by=10), times=3) Total <- 50 Kill <- c(10, 30, 40, 45, 38, 5, 25, 35, 40, 32, 0, 32, 38, 47, 40) df <- data.frame(Type, Conc, Total, Kill) fm1 <- glm( formula = Kill/Total~Type*Conc , family = binomial(link="logit") , data = df , weights = Total ) summary

glmnet: How do I know which factor level of my response is coded as 1 in logistic regression

阅读更多关于 glmnet: How do I know which factor level of my response is coded as 1 in logistic regression

I have a logistic regression model that I made using the glmnet package. My response variable was coded as a factor, the levels of which I will refer to as "a" and "b". The mathematics of logistic regression label one of the two classes as "0" and the other as "1". The feature coefficients of a logistic regression model are either positive, negative, or zero. If a feature "f"'s coefficient is positive, then increasing the value of "f" for a test observation x increases the probability that the model classifies x as being of class "1". My question is: Given a glmnet model, how do you know how

Multiple Logistic Regression with Interaction between Quantitative and Qualitative Explanatory Variables

阅读更多关于 Multiple Logistic Regression with Interaction between Quantitative and Qualitative Explanatory Variables

As a follow up to this question , I fitted the Multiple Logistic Regression with Interaction between Quantitative and Qualitative Explanatory Variables. MWE is given below: Type <- rep(x=LETTERS[1:3], each=5) Conc <- rep(x=seq(from=0, to=40, by=10), times=3) Total <- 50 Kill <- c(10, 30, 40, 45, 38, 5, 25, 35, 40, 32, 0, 32, 38, 47, 40) df <- data.frame(Type, Conc, Total, Kill) fm1 <- glm( formula = Kill/Total~Type*Conc , family = binomial(link="logit") , data = df , weights = Total ) summary(fm1) Call: glm(formula = Kill/Total ~ Type * Conc, family = binomial(link = "logit"), data = df,

feature selection using logistic regression

阅读更多关于 feature selection using logistic regression

I am performing feature selection ( on a dataset with 1,930,388 rows and 88 features) using Logistic Regression. If I test the model on held-out data, the accuracy is just above 60%. The response variable is equally distributed. My question is, if the model's performance is not good, can I consider the features that it gives as actual important features? Or should I try to improve the accuracy of the model though my end-goal is not to improve the accuracy but only get important features sklearn's GridSearchCV has some pretty neat methods to give you the best feature set. For example, consider