How to specify log link in glmnet?

允我心安 提交于 2020-06-27 07:30:27

问题


I'm running an elastic net on a generalized linear model with the glmnet and caret packages in R.

My response variable is cost (where cost > $0) and hence I'd like to specify a Gaussian family with a log link for my GLM. However glmnet doesn't seem to allow me to specify (link="log") as follows:

> lasso_fit <- glmnet(x, y, alpha=1, family="gaussian"(link="log"), lambda.min.ratio=.001)

I've tried different variants, with and without quotations, but no luck. The glmnet documentation doesn't discuss how to include a log link.

Am I missing something? Does family="gaussian" already implicitly assume a log link?


回答1:


It is a bit confusing, but the family argument in glmnet and glm are quite different. In glm, you can specify a character like "gaussian", or you can specify a function with some arguments, like gaussian(link="log"). In glmnet, you can only specify the family with a character, like "gaussian", and there is no way to automatically set the link through that argument.

The default link for gaussian is the identity function, that is, no transformation at all. But, remember that a link function is just a transformation of your y variable; you can just specify it yourself:

glmnet(x, log(y), family="gaussian")

Also note that the default link for the poisson family is log, but the objective function will change. See the Details under ?glmnet in the first couple of paragraphs.


Your comments have led me to rethink my answer; I have evidence that it is not correct.

As you point out, there is a difference between E[log(Y)] and log(E[Y]). I think what the above code does is to fit E[log(Y)], which is not what you want. Here is some code to generate data and confirm what you noted in the comments:

# Generate data
set.seed(1)
x <- replicate(3,runif(1000))
y <- exp(2*x[,1] + 3*x[,2] + x[,3] + runif(1000))
df <- data.frame(y,x)

# Run the model you *want*
glm(y~., family=gaussian(link="log"), data=df)$coef
# (Intercept)          X1          X2          X3 
#   0.4977746   2.0449443   3.0812333   0.9451073 

# Run the model you *don't want* (in two ways)    
glm(log(y)~., family=gaussian(link='identity'), data=df)$coef
# (Intercept)          X1          X2          X3 
#   0.4726745   2.0395798   3.0167274   0.9957110 
lm(log(y)~.,data=df)$coef
# (Intercept)          X1          X2          X3 
#   0.4726745   2.0395798   3.0167274   0.9957110 

# Run the glmnet code that I suggested - getting what you *don't want*.
library(glmnet)
glmnet.model <- glmnet(x,log(y),family="gaussian", thresh=1e-8, lambda=0)
c(glmnet.model$a0, glmnet.model$beta[,1])
#        s0        V1        V2        V3 
# 0.4726745 2.0395798 3.0167274 0.9957110 


来源:https://stackoverflow.com/questions/25206844/how-to-specify-log-link-in-glmnet

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!