glm | 易学教程

predict.glm() with three new categories in the test data (r)(error)

阅读更多关于 predict.glm() with three new categories in the test data (r)(error)

问题 I have a data set called data which has 481 092 rows. I split data into two equal halves: The first halve (row 1: 240 546) is called train and was used for the glm() ; the second halve (row 240 547 : 481 092) is called test and should be used to validate the model; Then I started the regression: testreg <- glm(train$returnShipment ~ train$size + train$color + train$price + train$manufacturerID + train$salutation + train$state + train$age + train$deliverytime, family=binomial(link="logit"),

h2o.glm lambda search not appearing to iterate over all lambdas

阅读更多关于 h2o.glm lambda search not appearing to iterate over all lambdas

Please consider the following basic reproducible example: library(h2o) h2o.init() data("iris") iris.hex = as.h2o(iris, "iris.hex") mod = h2o.glm(y = "Sepal.Length", x = setdiff(colnames(iris), "Sepal.Length"), training_frame = iris.hex, nfolds = 2, seed = 100, lambda_search = T, early_stopping = F, family = "gamma", nlambdas = 100) When I run the above, I expect that h2o will iterate over 100 different values of lambda. However, running length(mod@allparameters$lambda) will show that only 79 values of lambda were actually tested. These 79 values are the first 79 values in the sequence:

h2o.glm lambda search not appearing to iterate over all lambdas

阅读更多关于 h2o.glm lambda search not appearing to iterate over all lambdas

问题 Please consider the following basic reproducible example: library(h2o) h2o.init() data("iris") iris.hex = as.h2o(iris, "iris.hex") mod = h2o.glm(y = "Sepal.Length", x = setdiff(colnames(iris), "Sepal.Length"), training_frame = iris.hex, nfolds = 2, seed = 100, lambda_search = T, early_stopping = F, family = "gamma", nlambdas = 100) When I run the above, I expect that h2o will iterate over 100 different values of lambda. However, running length(mod@allparameters$lambda) will show that only 79

Regression for a Rate variable in R

阅读更多关于 Regression for a Rate variable in R

问题 I was tasked with developing a regression model looking at student enrollment in different programs. This is a very nice, clean data set where the enrollment counts follow a Poisson distribution well. I fit a model in R (using both GLM and Zero Inflated Poisson.) The resulting residuals seemed reasonable. However, I was then instructed to change the count of students to a "rate" which was calculated as students / school_population (Each school has its own population.)) This is now no longer a

Saving a single object within a function in R: RData file size is very large

阅读更多关于 Saving a single object within a function in R: RData file size is very large

问题 I am trying to save trimmed-down GLM objects in R (i.e. with all the "non-essential" characteristics set to NULL e.g. residuals, prior.weights, qr$qr). As an example, looking at the smallest object that I need to do this with: print(object.size(glmObject)) 168992 bytes save(glmObject, "FileName.RData") Assigning this object in the global environment and saving leads to an RData file of about 6KB. However, I effectively need to create and save the glm object within a function, which is in

Why is caret train taking up so much memory?

阅读更多关于 Why is caret train taking up so much memory?

When I train just using glm , everything works, and I don't even come close to exhausting memory. But when I run train(..., method='glm') , I run out of memory. Is this because train is storing a lot of data for each iteration of the cross-validation (or whatever the trControl procedure is)? I'm looking at trainControl and I can't find how to prevent this...any hints? I only care about the performance summary and maybe the predicted responses. (I know it's not related to storing data from each iteration of the parameter-tuning grid search because there's no grid for glm's, I believe.) The

Why is it inadvisable to get statistical summary information for regression coefficients from glmnet model?

阅读更多关于 Why is it inadvisable to get statistical summary information for regression coefficients from glmnet model?

I have a regression model with binary outcome. I fitted the model with glmnet and got the selected variables and their coefficients. Since glmnet doesn't calculate variable importance, I would like to feed the exact output (selected variables and their coefficients) to glm to get the information (Standard errors, etc). I searched r documents, it seems I can use "method" option in glm to specify user defined function. But I failed to do so, could someone help me with this? "It is a very natural question to ask for standard errors of regression coefficients or other estimated quantities. In

How to update `lm` or `glm` model on same subset of data?

阅读更多关于 How to update `lm` or `glm` model on same subset of data?

I am trying to fit two nested models and then test those against each other using anova function. The commands used are: probit <- glm(grad ~ afqt1 + fhgc + mhgc + hisp + black + male, data=dt, family=binomial(link = "probit")) nprobit <- update(probit, . ~ . - afqt1) anova(nprobit, probit, test="Rao") However, the variable afqt1 apparently contains NA s and because the update call does not take the same subset of data, anova() returns error Error in anova.glmlist(c(list(object), dotargs), dispersion = dispersion, : models were not all fitted to the same size of dataset Is there a simple way

model.matrix(): why do I lose control of contrast in this case

阅读更多关于 model.matrix(): why do I lose control of contrast in this case

Suppose we have a toy data frame: x <- data.frame(x1 = gl(3, 2, labels = letters[1:3]), x2 = gl(3, 2, labels = LETTERS[1:3])) I would like to construct a model matrix # x1b x1c x2B x2C # 1 0 0 0 0 # 2 0 0 0 0 # 3 1 0 1 0 # 4 1 0 1 0 # 5 0 1 0 1 # 6 0 1 0 1 by: model.matrix(~ x1 + x2 - 1, data = x, contrasts.arg = list(x1 = contr.treatment(letters[1:3]), x2 = contr.treatment(LETTERS[1:3]))) but actually I get: # x1a x1b x1c x2B x2C # 1 1 0 0 0 0 # 2 1 0 0 0 0 # 3 0 1 0 1 0 # 4 0 1 0 1 0 # 5 0 0 1 0 1 # 6 0 0 1 0 1 # attr(,"assign") # [1] 1 1 1 2 2 # attr(,"contrasts") # attr(,"contrasts")$x1 #

Why is it inadvisable to get statistical summary information for regression coefficients from glmnet model?

阅读更多关于 Why is it inadvisable to get statistical summary information for regression coefficients from glmnet model?

问题 I have a regression model with binary outcome. I fitted the model with glmnet and got the selected variables and their coefficients. Since glmnet doesn't calculate variable importance, I would like to feed the exact output (selected variables and their coefficients) to glm to get the information (Standard errors, etc). I searched r documents, it seems I can use "method" option in glm to specify user defined function. But I failed to do so, could someone help me with this? 回答1: "It is a very