glm

Difference in GLM results between iPython and R

邮差的信 提交于 2019-12-04 09:41:32
I'm trying to get to grips with performing regression analyses in R. Below is some random dummy data that I have generated in R, run a logistic glm in R. I have saved the data into a test file, read that into python with ipython (ipython notebook is awesome btw, only just started using it!), and then tried to run the same analyis with python. The results are very similar but they are different. I kind of would have expected them to be the same. Have I done something wrong, is there a parameter I am missing, or the difference due to some underlying calculation? Any help appreciated! EDIT: I don

How do I extract lmer fixed effects by observation?

一笑奈何 提交于 2019-12-04 08:20:47
问题 I have a lme object, constructed from some repeated measures nutrient intake data (two 24-hour intake periods per RespondentID): Male.lme2 <- lmer(BoxCoxXY ~ -1 + AgeFactor + IntakeDay + (1|RespondentID), data = Male.Data, weights = SampleWeight) and I can successfully retrieve the random effects by RespondentID using ranef(Male.lme1) . I would also like to collect the result of the fixed effects by RespondentID . coef(Male.lme1) does not provide exactly what I need, as I show below. >

caret train() predicts very different then predict.glm()

让人想犯罪 __ 提交于 2019-12-04 07:23:30
问题 I am trying to estimate a logistic regression, using the 10-fold cross-validation. #import libraries library(car); library(caret); library(e1071); library(verification) #data import and preparation data(Chile) chile <- na.omit(Chile) #remove "na's" chile <- chile[chile$vote == "Y" | chile$vote == "N" , ] #only "Y" and "N" required chile$vote <- factor(chile$vote) #required to remove unwanted levels chile$income <- factor(chile$income) # treat income as a factor Goal is to estimate a glm -

How do I use a custom link function in glm?

依然范特西╮ 提交于 2019-12-04 07:09:17
I don't want to use the standard log link in glm for Poisson regression, since I have zeros. Consider the following code: foo = 0:10 bar = 2 * foo glm(bar ~ foo, family = poisson(link = "identity")) I get the error: Error: no valid set of coefficients has been found: please supply starting values I'm not certain what this means. Is the "identity" link function what I think it is (i.e. it doesn't transform the data at all)? What does this error mean and how can I resolve it? You can get an answer if you start somewhere other than the default (0,0) starting point. The start parameter is a vector

How can I generate marginal effects for a logit model when using survey weights?

≡放荡痞女 提交于 2019-12-04 07:07:38
I normally generate logit model marginal effects using the mfx package and the logitmfx function. However, the current survey I am using has weights (which have a large effect on the proportion of the DV in the sample because of oversampling in some populations) and logitmfx doesn't appear to have any way to include weights. I have fitted the model with svyglm as follows: library(survey) survey.design <- svydesign(ids = combined.survey$id, weights = combined.survey$weight, data = combined.survey) vote.pred.1 <- svyglm(formula = turnout ~ gender + age.group + education + income, design = survey

What do you need to watch out for when using cross-validation with GLM lambda search?

情到浓时终转凉″ 提交于 2019-12-04 06:59:01
问题 Regarding h2o.glm lambda search not appearing to iterate over all lambdas, I read the question as complaining that lambda was too high; they tried setting early_stopping=F in the hope that might fix that "bug". Isn't it the case that the original behaviour was a feature, not a bug? And if that is correct, then you should always use early_stopping=T when using cross-validation with GLM, otherwise the error estimate from cross-validation is useless; you also risk over-fitting. (My main question

Model runs with glm but not bigglm

北城以北 提交于 2019-12-04 05:31:52
I was trying to run a logistic regression on 320,000 rows of data (6 variables). Stepwise model selection on a sample of the data (10000) gives a rather complex model with 5 interaction terms: Y~X1+ X2*X3+ X2*X4+ X2*X5+ X3*X6+ X4*X5 . The glm() function could fit this model with 10000 rows of data, but not with the whole dataset (320,000). Using bigglm to read data chunk by chunk from a SQL server resulted in an error, and I couldn't make sense of the results from traceback() : fit <- bigglm(Y~X1+ X2*X3+ X2*X4+ X2*X5+ X3*X6+ X4*X5, data=sqlQuery(myconn,train_dat),family=binomial(link="logit"),

Calculate cross validation for Generalized Linear Model in Matlab

流过昼夜 提交于 2019-12-03 20:50:51
I am doing a regression using Generalized Linear Model.I am caught offguard using the crossVal function. My implementation so far; x = 'Some dataset, containing the input and the output' X = x(:,1:7); Y = x(:,8); cvpart = cvpartition(Y,'holdout',0.3); Xtrain = X(training(cvpart),:); Ytrain = Y(training(cvpart),:); Xtest = X(test(cvpart),:); Ytest = Y(test(cvpart),:); mdl = GeneralizedLinearModel.fit(Xtrain,Ytrain,'linear','distr','poisson'); Ypred = predict(mdl,Xtest); res = (Ypred - Ytest); RMSE_test = sqrt(mean(res.^2)); The code below is for calculating cross validation for mulitple

How to save glm result without data or only with coeffients for prediction?

一个人想着一个人 提交于 2019-12-03 17:47:33
问题 When I use the following R code, model_glm=glm(V1~. , data=xx,family="binomial"); save(file="modelfile",model_glm); The size of modelfile will be as much as the data, which will be 1gig in my case. How can I remove the data part in the result of model_glm, so I can only save a small file. 回答1: Setting model = FALSE in your call to glm should prevent the model.frame from being returned. Also setting y = FALSE will prevent the response vector from being returned. x = FALSE is the default

Difference between glmnet() and cv.glmnet() in R?

和自甴很熟 提交于 2019-12-03 16:56:26
问题 I'm working on a project that would show the potential influence a group of events have on an outcome. I'm using the glmnet() package, specifically using the Poisson feature. Here's my code: # de <- data imported from sql connection x <- model.matrix(~.,data = de[,2:7]) y <- (de[,1]) reg <- cv.glmnet(x,y, family = "poisson", alpha = 1) reg1 <- glmnet(x,y, family = "poisson", alpha = 1) **Co <- coef(?reg or reg1?,s=???)** summ <- summary(Co) c <- data.frame(Name= rownames(Co)[summ$i], Lambda=