glm | 易学教程

glm starting values not accepted log-link

阅读更多关于 glm starting values not accepted log-link

问题 I want to run a Gaussian GLM with a log link and an offset. The following problems arise: y <- c(1,1,0,0) t <- c(5,3,2,4) No problem: exp(coef(glm(y~1 + offset(log(t)), family=poisson))) with family=gaussian , starting values need to be specified, it works here: exp(coef(glm(y~1, family=gaussian(link=log), start=0))) but does not work here: exp(coef(glm(y~1 + offset(log(t)), family=gaussian(link=log), start=0))) Error in eval(expr, envir, enclos) : cannot find valid starting values: please

Why the auc is so different from logistic regression of sklearn and R

阅读更多关于 Why the auc is so different from logistic regression of sklearn and R

I use a same dataset to train logistic regression model both in R and python sklearn. The dataset is unbalanced. And I find that the auc is quite different. This is the code of python: model_logistic = linear_model.LogisticRegression() #auc 0.623 model_logistic.fit(train_x, train_y) pred_logistic = model_logistic.predict(test_x) #mean:0.0235 var:0.023 print "logistic auc: ", sklearn.metrics.roc_auc_score(test_y,pred_logistic) This is the code of R: glm_fit <- glm(label ~ watch_cnt_7 + bid_cnt_7 + vi_cnt_itm_1 + ITEM_PRICE + add_to_cart_cnt_7 + offer_cnt_7 + dwell_dlta_4to2 + vi_cnt_itm_2 + asq

GLM fit (logistic regression) to SQL

阅读更多关于 GLM fit (logistic regression) to SQL

We frequently score data in database directly for simple models like linear or logisitc regression. It is always a little bit tricky to transfer all coefficients from R to SQL correctly. I thought I can make some R to SQL translation for glm result. For numeric variables this is pretty straightforward: library(rpart) fit <- glm(Kyphosis ~ ., data = kyphosis, family = binomial()) coefs <- fit$coef[2:length(fit$coef)] expr <- paste0('1/(1 + exp(-(',fit$coef[1], '+', paste0('(', coefs, '*', names(coefs), ')', collapse = '+'),')))') print(expr) a <- with(kyphosis, eval(parse(text = expr))) b <-

H2o GLM interact only certain predictors

阅读更多关于 H2o GLM interact only certain predictors

I'm interested in creating interaction terms in h2o.glm(). But I do not want to generate all pairwise interactions. For example, in the mtcars dataset...I want to interact 'mpg' with all the other factors such as 'cyl','hp', and 'disp' but I don't want the other factors to interact with each other (so I don't want disp_hp or disp_cyl). How should I best approach this problem using the (interactions = interactions_list) parameter in h2o.glm() ? Thank you According to ?h2o.glm the interactions= parameter takes: A list of predictor column indices to interact. All pairwise combinations will be

R - using glm inside a data.table

阅读更多关于 R - using glm inside a data.table

I'm trying to do some glm's inside a data.table to produce modelled results split by key factors. I've been doing this sucessfully for: High level glm glm(modellingDF,formula=Outcome~IntCol + DecCol,family=binomial(link=logit)) Scoped glm with single columns modellingDF[,list(Outcome, fitted=glm(x,formula=Outcome~IntCol ,family=binomial(link=logit))$fitted ), by=variable] Scoped glm with two integer columns modellingDF[,list(Outcome, fitted=glm(x,formula=Outcome~IntCol + IntCol2 ,family=binomial(link=logit))$fitted ), by=variable] But, when I try and do the high level glm inside the scope with

Missing object error when using step() within a user-defined function

阅读更多关于 Missing object error when using step() within a user-defined function

问题 5 days and still no answer As can be seen by Simon's comment, this is a reproducible and very strange issue. It seems that the issue only arises when a stepwise regression with very high predictive power is wrapped in a function. I have been struggling with this for a while and any help would be much appreciated. I am trying to write a function that runs several stepwise regressions and outputs all of them to a list. However, R is having trouble reading the dataset that I specify in my

Fractional Response Regression in R

阅读更多关于 Fractional Response Regression in R

I am trying to model my data in which the response variable is between 0 and 1, so I have decided to use fractional response model in R. From my current understanding, the fractional response model is similar to logistic regression, but it uses qausi-likelihood method to determine parameters. I am not sure I understand it correctly. So far what I have tried is the frm from package frm and glm on the following data, which is the same as this OP library(foreign) mydata <- read.dta("k401.dta") Further, I followed the procedures in this OP in which glm is used. However, with the same dataset with

MCMCglmm multinomial model in R

阅读更多关于 MCMCglmm multinomial model in R

问题 I'm trying to create a model using the MCMCglmm package in R. The data are structured as follows, where dyad, focal, other are all random effects, predict1-2 are predictor variables, and response 1-5 are outcome variables that capture # of observed behaviors of different subtypes: dyad focal other r present village resp1 resp2 resp3 resp4 resp5 1 10101 14302 0.5 3 1 0 0 4 0 5 2 10405 11301 0.0 5 0 0 0 1 0 1 … So a model with only one outcome (teaching) is as follows: prior_overdisp_i <- list

Predict.glm not predicting missing values in response

阅读更多关于 Predict.glm not predicting missing values in response

问题 For some reason, when I specify glms (and lm's too, it turns out), R is not predicting missing values of the data. Here is an example: y = round(runif(50)) y = c(y,rep(NA,50)) x = rnorm(100) m = glm(y~x, family=binomial(link="logit")) p = predict(m,na.action=na.pass) length(p) y = round(runif(50)) y = c(y,rep(NA,50)) x = rnorm(100) m = lm(y~x) p = predict(m) length(p) The length of p should be 100, but its 50. The weird thing is that I have other predicts in the same script that do predict

How to set the Coefficient Value in Regression; R

阅读更多关于 How to set the Coefficient Value in Regression; R

I'm looking for a way to specify the value of a predictor variable. When I run a glm with my current data, the coefficient for one of my variables is close to one. I'd like to set it at .8. I know this will give me a lower R^2 value, but I know a priori that the predictive power of the model will be greater. The weights component of glm looks promising, but I haven't figured it out yet. Any help would be greatly appreciated. I believe you are looking for the offset argument in glm . So for example, you might do something like this: glm(y ~ x1, offset = x2,...) where in this case the