logistic-regression

R: Calculate and interpret odds ratio in logistic regression

别说谁变了你拦得住时间么 提交于 2019-12-03 02:51:53
问题 I am having trouble interpreting the results of a logistic regression. My outcome variable is Decision and is binary (0 or 1, not take or take a product, respectively). My predictor variable is Thoughts and is continuous, can be positive or negative, and is rounded up to the 2nd decimal point. I want to know how the probability of taking the product changes as Thoughts changes. The logistic regression equation is: glm(Decision ~ Thoughts, family = binomial, data = data) According to this

Interpreting coefficients from Logistic Regression from R

怎甘沉沦 提交于 2019-12-02 21:57:31
问题 All, I ran a logistic Regression on a set of variables both categorical and continuous with a binary event as dependent variable. Now post modelling, I observe a set of categorical variables showing negative sign which I presume is to understand that if that categorical variable occurs high number of times then the probability of the dependent variable occurring is low. But when I see the % of occurrence of that independent variable I see the reverse trend happening. hence the result seems to

Does Quasi Separation matter in R binomial GLM?

自古美人都是妖i 提交于 2019-12-02 17:16:32
问题 I am learning how the quasi-separation affects R binomial GLM. And I start to think that it does not matter in some circumstance . In my understanding, we say that the data has quasi separation when some linear combination of factor levels can completely identify failure/non-failure. So I created an artificial dataset with a quasi separation in R as: fail <- c(100,100,100,100) nofail <- c(100,100,0,100) x1 <- c(1,0,1,0) x2 <- c(0,0,1,1) data <- data.frame(fail,nofail,x1,x2) rownames(data) <-

R: Calculate and interpret odds ratio in logistic regression

ぃ、小莉子 提交于 2019-12-02 16:50:46
I am having trouble interpreting the results of a logistic regression. My outcome variable is Decision and is binary (0 or 1, not take or take a product, respectively). My predictor variable is Thoughts and is continuous, can be positive or negative, and is rounded up to the 2nd decimal point. I want to know how the probability of taking the product changes as Thoughts changes. The logistic regression equation is: glm(Decision ~ Thoughts, family = binomial, data = data) According to this model, Thought s has a significant impact on probability of Decision (b = .72, p = .02). To determine the

R - Getting Column of Dataframe from String [duplicate]

我只是一个虾纸丫 提交于 2019-12-02 13:27:02
This question already has an answer here: Dynamically select data frame columns using $ and a vector of column names 8 answers I am trying to create a function that allows the conversion of selected columns of a data frame to categorical data type (factor) before running a regression analysis. Question is how do I slice a particular column from a data frame using a string (character). Example: strColumnNames <- "Admit,Rank" strDelimiter <- "," strSplittedColumnNames <- strsplit(strColumnNames, strDelimiter) for( strColName in strSplittedColumnNames[[1]] ){ dfData$as.name(strColName) <- factor

Does Quasi Separation matter in R binomial GLM?

坚强是说给别人听的谎言 提交于 2019-12-02 11:35:33
I am learning how the quasi-separation affects R binomial GLM. And I start to think that it does not matter in some circumstance . In my understanding, we say that the data has quasi separation when some linear combination of factor levels can completely identify failure/non-failure. So I created an artificial dataset with a quasi separation in R as: fail <- c(100,100,100,100) nofail <- c(100,100,0,100) x1 <- c(1,0,1,0) x2 <- c(0,0,1,1) data <- data.frame(fail,nofail,x1,x2) rownames(data) <- paste("obs",1:4) Then when x1=1 and x2=1 (obs 3) the data always doesn't fail. In this data, my

Multiple Logistic Regression with Interaction between Quantitative and Qualitative Explanatory Variables

断了今生、忘了曾经 提交于 2019-12-01 18:21:56
问题 As a follow up to this question, I fitted the Multiple Logistic Regression with Interaction between Quantitative and Qualitative Explanatory Variables. MWE is given below: Type <- rep(x=LETTERS[1:3], each=5) Conc <- rep(x=seq(from=0, to=40, by=10), times=3) Total <- 50 Kill <- c(10, 30, 40, 45, 38, 5, 25, 35, 40, 32, 0, 32, 38, 47, 40) df <- data.frame(Type, Conc, Total, Kill) fm1 <- glm( formula = Kill/Total~Type*Conc , family = binomial(link="logit") , data = df , weights = Total ) summary

glmnet: How do I know which factor level of my response is coded as 1 in logistic regression

こ雲淡風輕ζ 提交于 2019-12-01 18:10:41
I have a logistic regression model that I made using the glmnet package. My response variable was coded as a factor, the levels of which I will refer to as "a" and "b". The mathematics of logistic regression label one of the two classes as "0" and the other as "1". The feature coefficients of a logistic regression model are either positive, negative, or zero. If a feature "f"'s coefficient is positive, then increasing the value of "f" for a test observation x increases the probability that the model classifies x as being of class "1". My question is: Given a glmnet model, how do you know how

Multiple Logistic Regression with Interaction between Quantitative and Qualitative Explanatory Variables

£可爱£侵袭症+ 提交于 2019-12-01 18:04:13
As a follow up to this question , I fitted the Multiple Logistic Regression with Interaction between Quantitative and Qualitative Explanatory Variables. MWE is given below: Type <- rep(x=LETTERS[1:3], each=5) Conc <- rep(x=seq(from=0, to=40, by=10), times=3) Total <- 50 Kill <- c(10, 30, 40, 45, 38, 5, 25, 35, 40, 32, 0, 32, 38, 47, 40) df <- data.frame(Type, Conc, Total, Kill) fm1 <- glm( formula = Kill/Total~Type*Conc , family = binomial(link="logit") , data = df , weights = Total ) summary(fm1) Call: glm(formula = Kill/Total ~ Type * Conc, family = binomial(link = "logit"), data = df,

feature selection using logistic regression

柔情痞子 提交于 2019-12-01 13:20:40
I am performing feature selection ( on a dataset with 1,930,388 rows and 88 features) using Logistic Regression. If I test the model on held-out data, the accuracy is just above 60%. The response variable is equally distributed. My question is, if the model's performance is not good, can I consider the features that it gives as actual important features? Or should I try to improve the accuracy of the model though my end-goal is not to improve the accuracy but only get important features sklearn's GridSearchCV has some pretty neat methods to give you the best feature set. For example, consider