问题
I'm trying to understand the function lmer. I've found plenty of information about how to use the command, but not much about what it's actually doing (save for some cryptic comments here: http://www.bioconductor.org/help/course-materials/2008/PHSIntro/lme4Intro-handout-6.pdf). I'm playing with the following simple example:
library(data.table)
library(lme4)
options(digits=15)
n<-1000
m<-100
data<-data.table(id=sample(1:m,n,replace=T),key="id")
b<-rnorm(m)
data$y<-rand[data$id]+rnorm(n)*0.1
fitted<-lmer(b~(1|id),data=data,verbose=T)
fitted
I understand that lmer is fitting a model of the form Y_{ij} = beta + B_i + epsilon_{ij}, where epsilon_{ij} and B_i are independent normals with variances sigma^2 and tau^2 respectively. If theta = tau/sigma is fixed, I computed the estimate for beta with the correct mean and minimum variance to be
c = sum_{i,j} alpha_i y_{ij}
where
alpha_i = lambda/(1 + theta^2 n_i)
lambda = 1/[\sum_i n_i/(1+theta^2 n_i)]
n_i = number of observations from group i
I also computed the following unbiased estimate for sigma^2:
s^2 = \sum_{i,j} alpha_i (y_{ij} - c)^2 / (1 + theta^2 - lambda)
These estimates seem to agree with what lmer produces. However, I can't figure out how log likelihood is defined in this context. I calculated the probability density to be
pd(Y_{ij}=y_{ij}) = \prod_{i,j}[f_sigma(y_{ij}-ybar_i)]
* prod_i[f_{sqrt(sigma^2/n_i+tau^2)}(ybar_i-beta) sigma sqrt(2 pi/n_i)]
where
ybar_i = \sum_j y_{ij}/n_i (the mean of observations in group i)
f_sigma(x) = 1/(sqrt{2 pi}sigma) exp(-x^2/(2 sigma)) (normal density with sd sigma)
But log of the above is not what lmer produces. How is log likelihood computed in this case (and for bonus marks, why)?
Edit: Changed notation for consistency, striked out incorrect formula for standard deviation estimate.
回答1:
The links in the comments contained the answer. Below I've put what the formulae simplify to in this simple example, since the results are somewhat intuitive.
lmer fits a model of the form


where

The likelihood is obtained by integrating this with respect to

where

However, by default (REML=T) lmer maximises not the likelihood but the "REML criterion", obtained by additionally integrating this with respect to

where
Maximising likelihood (REML=F)
If



Note
Substituting these into likelihood, we can express the log likelihood

lmer iterates to find the value of
Maximising restricted likelihood (REML=T)
Since the REML criterion doesn't depend on


The restricted log likelihood

In the output of lmer,

来源:https://stackoverflow.com/questions/20980116/how-does-lmer-from-the-r-package-lme4-compute-log-likelihood