glm | 易学教程

How to plot interaction effects from extremely large data sets (esp. from rxGlm output)

阅读更多关于 How to plot interaction effects from extremely large data sets (esp. from rxGlm output)

问题 I am currenlty computing glm models off a huge data data set. Both glm and even speedglm take days to compute. I currently have around 3M observations and altogether 400 variables, only some of which are used for the regression. In my regression I use 4 integer independent variables ( iv1 , iv2 , iv3 , iv4 ), 1 binary independent variable as factor ( iv5 ), the interaction term ( x * y , where x is an integer and y is a binary dummy variable as factor). Finally, I have fixed effects along

R probit regression marginal effects

阅读更多关于 R probit regression marginal effects

问题 I am using R to replicate a study and obtain mostly the same results the author reported. At one point, however, I calculate marginal effects that seem to be unrealistically small. I would greatly appreciate if you could have a look at my reasoning and the code below and see if I am mistaken at one point or another. My sample contains 24535 observations, the dependent variable "x028bin" is a binary variable taking on the values 0 and 1, and there are furthermore 10 explaining variables. Nine

Warning: non-integer #successes in a binomial glm! (survey packages)

阅读更多关于 Warning: non-integer #successes in a binomial glm! (survey packages)

问题 I am using the twang package to create propensity scores, which are used as weights in a binomial glm using survey::svyglm . The code looks something like this: pscore <- ps(ppci ~ var1+var2+.........., data=dt....) dt$w <- get.weights(pscore, stop.method="es.mean") design.ps <- svydesign(ids=~1, weights=~w, data=dt,) glm1 <- svyglm(m30 ~ ppci, design=design.ps,family=binomial) This produces the following warning: Warning message: In eval(expr, envir, enclos) : non-integer #successes in a

statmodels in python package, How exactly duplicated features are handled?

阅读更多关于 statmodels in python package, How exactly duplicated features are handled?

问题 I am a heavy R user and am recently learning python. I have a question about how statsmodels.api handles duplicated features. In my understanding, this function is a python version of glm in R package. So I am expecting that the function returns the maximum likelihood estimates (MLE). My question is which algorithm is statsmodels employ to obtain MLE? Especially how is the algorithm handling the situation with duplicated features? To clarify my question, I generate a sample of size 50 from

How to get probability from GLM output

阅读更多关于 How to get probability from GLM output

问题 I'm extremely stuck at the moment as I am trying to figure out how to calculate the probability from my glm output in R. I know the data is very insignificant but I would really love to be shown how to get the probability from an output like this. I was thinking of trying inv.logit() but didn't know what variables to put within the brackets. The data is from occupancy study. I'm assessing the success of a hair trap method versus a camera trap in detecting 3 species (red squirrel, pine marten

Selecting the statistically significant variables in an R glm model

阅读更多关于 Selecting the statistically significant variables in an R glm model

问题 I have an outcome variable, say Y and a list of 100 dimensions that could affect Y (say X1...X100). After running my glm and viewing a summary of my model, I see those variables that are statistically significant. I would like to be able to select those variables and run another model and compare performance. Is there a way I can parse the model summary and select only the ones that are significant? 回答1: You can get access the pvalues of the glm result through the function "summary". The last

How to update `lm` or `glm` model on same subset of data?

阅读更多关于 How to update `lm` or `glm` model on same subset of data?

问题 I am trying to fit two nested models and then test those against each other using anova function. The commands used are: probit <- glm(grad ~ afqt1 + fhgc + mhgc + hisp + black + male, data=dt, family=binomial(link = "probit")) nprobit <- update(probit, . ~ . - afqt1) anova(nprobit, probit, test="Rao") However, the variable afqt1 apparently contains NA s and because the update call does not take the same subset of data, anova() returns error Error in anova.glmlist(c(list(object), dotargs),

predict() with arbitrary coefficients in r

阅读更多关于 predict() with arbitrary coefficients in r

问题 I've got some coefficients for a logit model set by a non-r user. I'd like to import those coefficients into r and generate some goodness of fit estimates on the same dataset (ROC and confusion matrix) vs my own model. My first thought was to coerce the coefficients into an existing GLM object using something like summary(fit)$coefficients[,1] <- y or summary(fit)$coefficients <- x where y and x are matrices containing the coefficients I'm trying to use to predict and fit is a previously

R : Pass argument to glm inside an R function

阅读更多关于 R : Pass argument to glm inside an R function

问题 I am trying to get used to scoping issues in R. I'd like to call the function glm() inside a function but it does not work, apparently for scoping reasons I did not manage to fix with the functions assign() or eval() . Here is a simplified version: ao <- function (y, x, phi = seq (0,1,0.1), dataset, weights) { logLikvector <- rep(0,length(phi)) # vector of zeros to be replaced thereafter for (i in 1:length(phi)) { # loop to use glm() fit <- glm (y ~ x, data = dataset, family = binomial,

R Logistic Regression Missing Coefficients

阅读更多关于 R Logistic Regression Missing Coefficients

问题 I am trying to asses the odds of people staying in a program given their backgrounds following these instructions. One of the variables I am looking at is age, which I split into five groups. I have run a test using the formula: mylogit15 <- glm(Stay_in_Progams ~ Age.Group + Prior_Experience, data = mydata, family = "binomial") The results of the test are clear enough, except I am missing the first and third age groups. This is what they look like: Coefficients: Estimate Std. Error z-value Pr