unscale and uncenter glmer parameters

前端未结

关注

 1  1628

I\'ve been struggling with converting scaled and centered model coefficients from a glmer model back to uncentered and unscaled values.

I analysed a dataset using GL

相关标签:

1条回答

孤城傲影

2020-12-10 17:44

Read in data:

source("SO_unscale.txt")

Separate unscaled and scaled variables (Transmitter.depth doesn't appear to have a scaled variant)

unsc.vars <- subset(dd,select=c(Transmitter.depth,
                       receiver.depth,water.temperature,
                       wind.speed,distance))
sc.vars <- subset(dd,select=c(Transmitter.depth,
                     Receiver.depth,Water.temperature,
                     Wind.speed,Distance))

I noticed that the means and standard deviations of the scaled variables were not exactly 0/1, perhaps because what's here is a subset of the data. In any case, we will need the means and standard deviations of the original data in order to unscale.

colMeans(sc.vars)
apply(sc.vars,2,sd)
cm <- colMeans(unsc.vars)
csd <- apply(unsc.vars,2,sd)

It is possible to 'unscale' even if the new variables are not exactly centered/scaled (one would just need to enter the actual amount of the shift/scaling done), but it's marginally more complicated, so I'm just going to go ahead and fit with precisely centered/scaled variables.

## changed data name to dd
library(lme4)
cs. <- function(x) scale(x,center=TRUE,scale=TRUE)
m1 <- glmer(Valid.detections ~ Transmitter.depth +
            receiver.depth + water.temperature + 
            wind.speed + distance + (distance | SUR.ID),
            data=dd, family = poisson,
            control=glmerControl(optimizer=c("bobyqa","Nelder_Mead")))
## FAILS with bobyqa alone
m1.sc <- glmer(Valid.detections ~ cs.(Transmitter.depth) +
               cs.(receiver.depth) + cs.(water.temperature) + 
               cs.(wind.speed) + cs.(distance) + (cs.(distance) | SUR.ID),
               data=dd, family = poisson,
               control=glmerControl(optimizer=c("bobyqa","Nelder_Mead")))

An important point is that in this case the very different scaling doesn't seem to do any harm; the scaled and unscaled model get essentially the same goodness of fit (if it were important, we would expect the scaled fit to do better)

logLik(m1)-logLik(m1.sc)  ## 1e-7

Here is the rescaling function given in a previous answer:

rescale.coefs <- function(beta,mu,sigma) {
    beta2 <- beta ## inherit names etc.
    beta2[-1] <- sigma[1]*beta[-1]/sigma[-1]
    beta2[1]  <- sigma[1]*beta[1]+mu[1]-sum(beta2[-1]*mu[-1])
    beta2
}

The parameters do indeed match very closely. (The shifting/scaling vectors include possible scaling/shifting of the response variable, so we start with 0/1 since the response is not scaled [it would rarely make sense to scale a response variable for a GLMM, but this function can be useful for LMMs too].)

(cc <- rescale.coefs(fixef(m1.sc),mu=c(0,cm),sigma=c(1,csd)))
##            (Intercept) cs.(Transmitter.depth)    cs.(receiver.depth) 
##            3.865879406            0.011158402           -0.554392645 
## cs.(water.temperature)        cs.(wind.speed)          cs.(distance) 
##           -0.050833325           -0.042188495           -0.007231021 

fixef(m1)
##  (Intercept) Transmitter.depth    receiver.depth water.temperature 
##  3.865816422       0.011180213      -0.554498582      -0.050830611 
##   wind.speed          distance 
## -0.042179333      -0.007231004

Since they're the same (since the unscaled model does fit OK), we could use either set for this calculation.

ddist <- 1:1000
vals <- cbind(`(Intercept)`=1,Transmitter.depth=0.6067926,
          Receiver.depth=-0.1610828,Water.temperature=-0.1128282,
          Wind.speed=-0.2959290,distance=ddist)
pred.obs <- exp(cc %*% t(vals))
max(ddist[pred.obs>1])

Now suppose you want to do similar scaling/unscaling for a model with interactions or other complexities (i.e. the predictor variables, the columns of the fixed-effect model matrix, are not the same as the input variables, which are the variables that appear in the formula)

m2 <- update(m1,. ~ . + wind.speed:distance)
m2.sc <- update(m1.sc,. ~ . + I(cs.(wind.speed*distance)))
logLik(m2)-logLik(m2.sc)

Calculate mean/sd of model matrix, dropping the first (intercept) value:

X <- getME(m2,"X")                                        
cm2 <- colMeans(X)[-1]
csd2 <- apply(X,2,sd)[-1]                                            
(cc2 <- rescale.coefs(fixef(m2.sc),mu=c(0,cm2),sigma=c(1,csd2)))
all.equal(unname(cc2),unname(fixef(m2)),tol=1e-3)  ## TRUE

You don't actually have to fit the full unscaled model just to get the scaling parameters: you could use model.matrix([formula],data) to derive the model matrix. That is, if you haven't already fitted m2 and you want to get X to get the column means and standard deviations, i.e.

X <- model.matrix(Valid.detections ~ Transmitter.depth + receiver.depth +
                      water.temperature + 
                      wind.speed + distance + 
                      wind.speed:distance,
                  data=dd)

If you have a LMM/have scaled the response variable, you should also multiply all of the standard deviations (including the residual error, sigma(fitted_model)) by the original SD of the response variable.

0 讨论(0)