glm | 易学教程

Difference in GLM results between iPython and R

阅读更多关于 Difference in GLM results between iPython and R

I'm trying to get to grips with performing regression analyses in R. Below is some random dummy data that I have generated in R, run a logistic glm in R. I have saved the data into a test file, read that into python with ipython (ipython notebook is awesome btw, only just started using it!), and then tried to run the same analyis with python. The results are very similar but they are different. I kind of would have expected them to be the same. Have I done something wrong, is there a parameter I am missing, or the difference due to some underlying calculation? Any help appreciated! EDIT: I don

How do I extract lmer fixed effects by observation?

阅读更多关于 How do I extract lmer fixed effects by observation?

问题 I have a lme object, constructed from some repeated measures nutrient intake data (two 24-hour intake periods per RespondentID): Male.lme2 <- lmer(BoxCoxXY ~ -1 + AgeFactor + IntakeDay + (1|RespondentID), data = Male.Data, weights = SampleWeight) and I can successfully retrieve the random effects by RespondentID using ranef(Male.lme1) . I would also like to collect the result of the fixed effects by RespondentID . coef(Male.lme1) does not provide exactly what I need, as I show below. >

caret train() predicts very different then predict.glm()

阅读更多关于 caret train() predicts very different then predict.glm()

问题 I am trying to estimate a logistic regression, using the 10-fold cross-validation. #import libraries library(car); library(caret); library(e1071); library(verification) #data import and preparation data(Chile) chile <- na.omit(Chile) #remove "na's" chile <- chile[chile$vote == "Y" | chile$vote == "N" , ] #only "Y" and "N" required chile$vote <- factor(chile$vote) #required to remove unwanted levels chile$income <- factor(chile$income) # treat income as a factor Goal is to estimate a glm -

How do I use a custom link function in glm?

阅读更多关于 How do I use a custom link function in glm?

I don't want to use the standard log link in glm for Poisson regression, since I have zeros. Consider the following code: foo = 0:10 bar = 2 * foo glm(bar ~ foo, family = poisson(link = "identity")) I get the error: Error: no valid set of coefficients has been found: please supply starting values I'm not certain what this means. Is the "identity" link function what I think it is (i.e. it doesn't transform the data at all)? What does this error mean and how can I resolve it? You can get an answer if you start somewhere other than the default (0,0) starting point. The start parameter is a vector

How can I generate marginal effects for a logit model when using survey weights?

阅读更多关于 How can I generate marginal effects for a logit model when using survey weights?

I normally generate logit model marginal effects using the mfx package and the logitmfx function. However, the current survey I am using has weights (which have a large effect on the proportion of the DV in the sample because of oversampling in some populations) and logitmfx doesn't appear to have any way to include weights. I have fitted the model with svyglm as follows: library(survey) survey.design <- svydesign(ids = combined.survey$id, weights = combined.survey$weight, data = combined.survey) vote.pred.1 <- svyglm(formula = turnout ~ gender + age.group + education + income, design = survey

What do you need to watch out for when using cross-validation with GLM lambda search?

阅读更多关于 What do you need to watch out for when using cross-validation with GLM lambda search?

问题 Regarding h2o.glm lambda search not appearing to iterate over all lambdas, I read the question as complaining that lambda was too high; they tried setting early_stopping=F in the hope that might fix that "bug". Isn't it the case that the original behaviour was a feature, not a bug? And if that is correct, then you should always use early_stopping=T when using cross-validation with GLM, otherwise the error estimate from cross-validation is useless; you also risk over-fitting. (My main question

Model runs with glm but not bigglm

阅读更多关于 Model runs with glm but not bigglm

I was trying to run a logistic regression on 320,000 rows of data (6 variables). Stepwise model selection on a sample of the data (10000) gives a rather complex model with 5 interaction terms: Y~X1+ X2*X3+ X2*X4+ X2*X5+ X3*X6+ X4*X5 . The glm() function could fit this model with 10000 rows of data, but not with the whole dataset (320,000). Using bigglm to read data chunk by chunk from a SQL server resulted in an error, and I couldn't make sense of the results from traceback() : fit <- bigglm(Y~X1+ X2*X3+ X2*X4+ X2*X5+ X3*X6+ X4*X5, data=sqlQuery(myconn,train_dat),family=binomial(link="logit"),

Calculate cross validation for Generalized Linear Model in Matlab

阅读更多关于 Calculate cross validation for Generalized Linear Model in Matlab

I am doing a regression using Generalized Linear Model.I am caught offguard using the crossVal function. My implementation so far; x = 'Some dataset, containing the input and the output' X = x(:,1:7); Y = x(:,8); cvpart = cvpartition(Y,'holdout',0.3); Xtrain = X(training(cvpart),:); Ytrain = Y(training(cvpart),:); Xtest = X(test(cvpart),:); Ytest = Y(test(cvpart),:); mdl = GeneralizedLinearModel.fit(Xtrain,Ytrain,'linear','distr','poisson'); Ypred = predict(mdl,Xtest); res = (Ypred - Ytest); RMSE_test = sqrt(mean(res.^2)); The code below is for calculating cross validation for mulitple

How to save glm result without data or only with coeffients for prediction?

阅读更多关于 How to save glm result without data or only with coeffients for prediction?

问题 When I use the following R code, model_glm=glm(V1~. , data=xx,family="binomial"); save(file="modelfile",model_glm); The size of modelfile will be as much as the data, which will be 1gig in my case. How can I remove the data part in the result of model_glm, so I can only save a small file. 回答1: Setting model = FALSE in your call to glm should prevent the model.frame from being returned. Also setting y = FALSE will prevent the response vector from being returned. x = FALSE is the default

Difference between glmnet() and cv.glmnet() in R?

阅读更多关于 Difference between glmnet() and cv.glmnet() in R?

问题 I'm working on a project that would show the potential influence a group of events have on an outcome. I'm using the glmnet() package, specifically using the Poisson feature. Here's my code: # de <- data imported from sql connection x <- model.matrix(~.,data = de[,2:7]) y <- (de[,1]) reg <- cv.glmnet(x,y, family = "poisson", alpha = 1) reg1 <- glmnet(x,y, family = "poisson", alpha = 1) **Co <- coef(?reg or reg1?,s=???)** summ <- summary(Co) c <- data.frame(Name= rownames(Co)[summ$i], Lambda=