unscale and uncenter glmer parameters

前端 未结 1 1623
谎友^
谎友^ 2020-12-10 17:05

I\'ve been struggling with converting scaled and centered model coefficients from a glmer model back to uncentered and unscaled values.

I analysed a dataset using GL

相关标签:
1条回答
  • 2020-12-10 17:44

    Read in data:

    source("SO_unscale.txt")
    

    Separate unscaled and scaled variables (Transmitter.depth doesn't appear to have a scaled variant)

    unsc.vars <- subset(dd,select=c(Transmitter.depth,
                           receiver.depth,water.temperature,
                           wind.speed,distance))
    sc.vars <- subset(dd,select=c(Transmitter.depth,
                         Receiver.depth,Water.temperature,
                         Wind.speed,Distance))
    

    I noticed that the means and standard deviations of the scaled variables were not exactly 0/1, perhaps because what's here is a subset of the data. In any case, we will need the means and standard deviations of the original data in order to unscale.

    colMeans(sc.vars)
    apply(sc.vars,2,sd)
    cm <- colMeans(unsc.vars)
    csd <- apply(unsc.vars,2,sd)
    

    It is possible to 'unscale' even if the new variables are not exactly centered/scaled (one would just need to enter the actual amount of the shift/scaling done), but it's marginally more complicated, so I'm just going to go ahead and fit with precisely centered/scaled variables.

    ## changed data name to dd
    library(lme4)
    cs. <- function(x) scale(x,center=TRUE,scale=TRUE)
    m1 <- glmer(Valid.detections ~ Transmitter.depth +
                receiver.depth + water.temperature + 
                wind.speed + distance + (distance | SUR.ID),
                data=dd, family = poisson,
                control=glmerControl(optimizer=c("bobyqa","Nelder_Mead")))
    ## FAILS with bobyqa alone
    m1.sc <- glmer(Valid.detections ~ cs.(Transmitter.depth) +
                   cs.(receiver.depth) + cs.(water.temperature) + 
                   cs.(wind.speed) + cs.(distance) + (cs.(distance) | SUR.ID),
                   data=dd, family = poisson,
                   control=glmerControl(optimizer=c("bobyqa","Nelder_Mead")))
    

    An important point is that in this case the very different scaling doesn't seem to do any harm; the scaled and unscaled model get essentially the same goodness of fit (if it were important, we would expect the scaled fit to do better)

    logLik(m1)-logLik(m1.sc)  ## 1e-7
    

    Here is the rescaling function given in a previous answer:

    rescale.coefs <- function(beta,mu,sigma) {
        beta2 <- beta ## inherit names etc.
        beta2[-1] <- sigma[1]*beta[-1]/sigma[-1]
        beta2[1]  <- sigma[1]*beta[1]+mu[1]-sum(beta2[-1]*mu[-1])
        beta2
    }     
    

    The parameters do indeed match very closely. (The shifting/scaling vectors include possible scaling/shifting of the response variable, so we start with 0/1 since the response is not scaled [it would rarely make sense to scale a response variable for a GLMM, but this function can be useful for LMMs too].)

    (cc <- rescale.coefs(fixef(m1.sc),mu=c(0,cm),sigma=c(1,csd)))
    ##            (Intercept) cs.(Transmitter.depth)    cs.(receiver.depth) 
    ##            3.865879406            0.011158402           -0.554392645 
    ## cs.(water.temperature)        cs.(wind.speed)          cs.(distance) 
    ##           -0.050833325           -0.042188495           -0.007231021 
    
    fixef(m1)
    ##  (Intercept) Transmitter.depth    receiver.depth water.temperature 
    ##  3.865816422       0.011180213      -0.554498582      -0.050830611 
    ##   wind.speed          distance 
    ## -0.042179333      -0.007231004 
    

    Since they're the same (since the unscaled model does fit OK), we could use either set for this calculation.

    ddist <- 1:1000
    vals <- cbind(`(Intercept)`=1,Transmitter.depth=0.6067926,
              Receiver.depth=-0.1610828,Water.temperature=-0.1128282,
              Wind.speed=-0.2959290,distance=ddist)
    pred.obs <- exp(cc %*% t(vals))
    max(ddist[pred.obs>1])
    

    Now suppose you want to do similar scaling/unscaling for a model with interactions or other complexities (i.e. the predictor variables, the columns of the fixed-effect model matrix, are not the same as the input variables, which are the variables that appear in the formula)

    m2 <- update(m1,. ~ . + wind.speed:distance)
    m2.sc <- update(m1.sc,. ~ . + I(cs.(wind.speed*distance)))
    logLik(m2)-logLik(m2.sc)
    

    Calculate mean/sd of model matrix, dropping the first (intercept) value:

    X <- getME(m2,"X")                                        
    cm2 <- colMeans(X)[-1]
    csd2 <- apply(X,2,sd)[-1]                                            
    (cc2 <- rescale.coefs(fixef(m2.sc),mu=c(0,cm2),sigma=c(1,csd2)))
    all.equal(unname(cc2),unname(fixef(m2)),tol=1e-3)  ## TRUE
    

    You don't actually have to fit the full unscaled model just to get the scaling parameters: you could use model.matrix([formula],data) to derive the model matrix. That is, if you haven't already fitted m2 and you want to get X to get the column means and standard deviations, i.e.

    X <- model.matrix(Valid.detections ~ Transmitter.depth + receiver.depth +
                          water.temperature + 
                          wind.speed + distance + 
                          wind.speed:distance,
                      data=dd)
    

    If you have a LMM/have scaled the response variable, you should also multiply all of the standard deviations (including the residual error, sigma(fitted_model)) by the original SD of the response variable.

    0 讨论(0)
提交回复
热议问题