I\'ve been struggling with converting scaled and centered model coefficients from a glmer model back to uncentered and unscaled values.
I analysed a dataset using GL
Read in data:
source("SO_unscale.txt")
Separate unscaled and scaled variables (Transmitter.depth
doesn't appear to have a scaled variant)
unsc.vars <- subset(dd,select=c(Transmitter.depth,
receiver.depth,water.temperature,
wind.speed,distance))
sc.vars <- subset(dd,select=c(Transmitter.depth,
Receiver.depth,Water.temperature,
Wind.speed,Distance))
I noticed that the means and standard deviations of the scaled variables were not exactly 0/1, perhaps because what's here is a subset of the data. In any case, we will need the means and standard deviations of the original data in order to unscale.
colMeans(sc.vars)
apply(sc.vars,2,sd)
cm <- colMeans(unsc.vars)
csd <- apply(unsc.vars,2,sd)
It is possible to 'unscale' even if the new variables are not exactly centered/scaled (one would just need to enter the actual amount of the shift/scaling done), but it's marginally more complicated, so I'm just going to go ahead and fit with precisely centered/scaled variables.
## changed data name to dd
library(lme4)
cs. <- function(x) scale(x,center=TRUE,scale=TRUE)
m1 <- glmer(Valid.detections ~ Transmitter.depth +
receiver.depth + water.temperature +
wind.speed + distance + (distance | SUR.ID),
data=dd, family = poisson,
control=glmerControl(optimizer=c("bobyqa","Nelder_Mead")))
## FAILS with bobyqa alone
m1.sc <- glmer(Valid.detections ~ cs.(Transmitter.depth) +
cs.(receiver.depth) + cs.(water.temperature) +
cs.(wind.speed) + cs.(distance) + (cs.(distance) | SUR.ID),
data=dd, family = poisson,
control=glmerControl(optimizer=c("bobyqa","Nelder_Mead")))
An important point is that in this case the very different scaling doesn't seem to do any harm; the scaled and unscaled model get essentially the same goodness of fit (if it were important, we would expect the scaled fit to do better)
logLik(m1)-logLik(m1.sc) ## 1e-7
Here is the rescaling function given in a previous answer:
rescale.coefs <- function(beta,mu,sigma) {
beta2 <- beta ## inherit names etc.
beta2[-1] <- sigma[1]*beta[-1]/sigma[-1]
beta2[1] <- sigma[1]*beta[1]+mu[1]-sum(beta2[-1]*mu[-1])
beta2
}
The parameters do indeed match very closely. (The shifting/scaling vectors include possible scaling/shifting of the response variable, so we start with 0/1 since the response is not scaled [it would rarely make sense to scale a response variable for a GLMM, but this function can be useful for LMMs too].)
(cc <- rescale.coefs(fixef(m1.sc),mu=c(0,cm),sigma=c(1,csd)))
## (Intercept) cs.(Transmitter.depth) cs.(receiver.depth)
## 3.865879406 0.011158402 -0.554392645
## cs.(water.temperature) cs.(wind.speed) cs.(distance)
## -0.050833325 -0.042188495 -0.007231021
fixef(m1)
## (Intercept) Transmitter.depth receiver.depth water.temperature
## 3.865816422 0.011180213 -0.554498582 -0.050830611
## wind.speed distance
## -0.042179333 -0.007231004
Since they're the same (since the unscaled model does fit OK), we could use either set for this calculation.
ddist <- 1:1000
vals <- cbind(`(Intercept)`=1,Transmitter.depth=0.6067926,
Receiver.depth=-0.1610828,Water.temperature=-0.1128282,
Wind.speed=-0.2959290,distance=ddist)
pred.obs <- exp(cc %*% t(vals))
max(ddist[pred.obs>1])
Now suppose you want to do similar scaling/unscaling for a model with interactions or other complexities (i.e. the predictor variables, the columns of the fixed-effect model matrix, are not the same as the input variables, which are the variables that appear in the formula)
m2 <- update(m1,. ~ . + wind.speed:distance)
m2.sc <- update(m1.sc,. ~ . + I(cs.(wind.speed*distance)))
logLik(m2)-logLik(m2.sc)
Calculate mean/sd of model matrix, dropping the first (intercept) value:
X <- getME(m2,"X")
cm2 <- colMeans(X)[-1]
csd2 <- apply(X,2,sd)[-1]
(cc2 <- rescale.coefs(fixef(m2.sc),mu=c(0,cm2),sigma=c(1,csd2)))
all.equal(unname(cc2),unname(fixef(m2)),tol=1e-3) ## TRUE
You don't actually have to fit the full unscaled model just to get the scaling parameters: you could use model.matrix([formula],data)
to derive the model matrix. That is, if you haven't already fitted m2
and you want to get X
to get the column means and standard deviations, i.e.
X <- model.matrix(Valid.detections ~ Transmitter.depth + receiver.depth +
water.temperature +
wind.speed + distance +
wind.speed:distance,
data=dd)
If you have a LMM/have scaled the response variable, you should also multiply all of the standard deviations (including the residual error, sigma(fitted_model)
) by the original SD of the response variable.