I\'m trying to understand the function lmer. I\'ve found plenty of information about how to use the command, but not much about what it\'s actually doing (save for some cryptic
The links in the comments contained the answer. Below I've put what the formulae simplify to in this simple example, since the results are somewhat intuitive.
lmer fits a model of the form , where
and
are independent normals with variances
and
respectively. The joint probability distribution of
and
is therefore
where
.
The likelihood is obtained by integrating this with respect to (which isn't observed) to give
where is the number of observations from group
, and
is the mean of observations from group
. This is somewhat intuitive since the first term captures spread within each group, which should have variance
, and the second captures the spread between groups. Note that
is the variance of
.
However, by default (REML=T) lmer maximises not the likelihood but the "REML criterion", obtained by additionally integrating this with respect to to give
where is given below.
If is fixed, we can explicitly find the
and
which maximise likelihood. They turn out to be
Note has two terms for variation within and between groups, and
is somewhere between the mean of
and the mean of
depending on the value of
.
Substituting these into likelihood, we can express the log likelihood in terms of
only:
lmer iterates to find the value of which minimises this. In the output,
and
are shown in the fields "deviance" and "logLik" (if REML=F) respectively.
Since the REML criterion doesn't depend on , we use the same estimate for
as above. We estimate
to maximise the REML criterion:
The restricted log likelihood is given by
In the output of lmer, and
are shown in the fields "REMLdev" and "logLik" (if REML=T) respectively.