glm

How to plot interaction effects from extremely large data sets (esp. from rxGlm output)

点点圈 提交于 2019-12-21 03:42:10
问题 I am currenlty computing glm models off a huge data data set. Both glm and even speedglm take days to compute. I currently have around 3M observations and altogether 400 variables, only some of which are used for the regression. In my regression I use 4 integer independent variables ( iv1 , iv2 , iv3 , iv4 ), 1 binary independent variable as factor ( iv5 ), the interaction term ( x * y , where x is an integer and y is a binary dummy variable as factor). Finally, I have fixed effects along

R probit regression marginal effects

断了今生、忘了曾经 提交于 2019-12-21 02:48:08
问题 I am using R to replicate a study and obtain mostly the same results the author reported. At one point, however, I calculate marginal effects that seem to be unrealistically small. I would greatly appreciate if you could have a look at my reasoning and the code below and see if I am mistaken at one point or another. My sample contains 24535 observations, the dependent variable "x028bin" is a binary variable taking on the values 0 and 1, and there are furthermore 10 explaining variables. Nine

Warning: non-integer #successes in a binomial glm! (survey packages)

眉间皱痕 提交于 2019-12-20 08:56:35
问题 I am using the twang package to create propensity scores, which are used as weights in a binomial glm using survey::svyglm . The code looks something like this: pscore <- ps(ppci ~ var1+var2+.........., data=dt....) dt$w <- get.weights(pscore, stop.method="es.mean") design.ps <- svydesign(ids=~1, weights=~w, data=dt,) glm1 <- svyglm(m30 ~ ppci, design=design.ps,family=binomial) This produces the following warning: Warning message: In eval(expr, envir, enclos) : non-integer #successes in a

statmodels in python package, How exactly duplicated features are handled?

纵然是瞬间 提交于 2019-12-20 07:42:29
问题 I am a heavy R user and am recently learning python. I have a question about how statsmodels.api handles duplicated features. In my understanding, this function is a python version of glm in R package. So I am expecting that the function returns the maximum likelihood estimates (MLE). My question is which algorithm is statsmodels employ to obtain MLE? Especially how is the algorithm handling the situation with duplicated features? To clarify my question, I generate a sample of size 50 from

How to get probability from GLM output

六眼飞鱼酱① 提交于 2019-12-19 11:04:53
问题 I'm extremely stuck at the moment as I am trying to figure out how to calculate the probability from my glm output in R. I know the data is very insignificant but I would really love to be shown how to get the probability from an output like this. I was thinking of trying inv.logit() but didn't know what variables to put within the brackets. The data is from occupancy study. I'm assessing the success of a hair trap method versus a camera trap in detecting 3 species (red squirrel, pine marten

Selecting the statistically significant variables in an R glm model

…衆ロ難τιáo~ 提交于 2019-12-18 10:45:00
问题 I have an outcome variable, say Y and a list of 100 dimensions that could affect Y (say X1...X100). After running my glm and viewing a summary of my model, I see those variables that are statistically significant. I would like to be able to select those variables and run another model and compare performance. Is there a way I can parse the model summary and select only the ones that are significant? 回答1: You can get access the pvalues of the glm result through the function "summary". The last

How to update `lm` or `glm` model on same subset of data?

筅森魡賤 提交于 2019-12-18 09:17:27
问题 I am trying to fit two nested models and then test those against each other using anova function. The commands used are: probit <- glm(grad ~ afqt1 + fhgc + mhgc + hisp + black + male, data=dt, family=binomial(link = "probit")) nprobit <- update(probit, . ~ . - afqt1) anova(nprobit, probit, test="Rao") However, the variable afqt1 apparently contains NA s and because the update call does not take the same subset of data, anova() returns error Error in anova.glmlist(c(list(object), dotargs),

predict() with arbitrary coefficients in r

巧了我就是萌 提交于 2019-12-17 19:36:54
问题 I've got some coefficients for a logit model set by a non-r user. I'd like to import those coefficients into r and generate some goodness of fit estimates on the same dataset (ROC and confusion matrix) vs my own model. My first thought was to coerce the coefficients into an existing GLM object using something like summary(fit)$coefficients[,1] <- y or summary(fit)$coefficients <- x where y and x are matrices containing the coefficients I'm trying to use to predict and fit is a previously

R : Pass argument to glm inside an R function

谁说我不能喝 提交于 2019-12-17 07:52:14
问题 I am trying to get used to scoping issues in R. I'd like to call the function glm() inside a function but it does not work, apparently for scoping reasons I did not manage to fix with the functions assign() or eval() . Here is a simplified version: ao <- function (y, x, phi = seq (0,1,0.1), dataset, weights) { logLikvector <- rep(0,length(phi)) # vector of zeros to be replaced thereafter for (i in 1:length(phi)) { # loop to use glm() fit <- glm (y ~ x, data = dataset, family = binomial,

R Logistic Regression Missing Coefficients

早过忘川 提交于 2019-12-14 02:19:30
问题 I am trying to asses the odds of people staying in a program given their backgrounds following these instructions. One of the variables I am looking at is age, which I split into five groups. I have run a test using the formula: mylogit15 <- glm(Stay_in_Progams ~ Age.Group + Prior_Experience, data = mydata, family = "binomial") The results of the test are clear enough, except I am missing the first and third age groups. This is what they look like: Coefficients: Estimate Std. Error z-value Pr