predict and model.matrix give different predicted means within levels of a factor variable

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-13 01:46:57

问题


This question arose as a result of another question posted here: non-conformable arguments error from lmer when trying to extract information from the model matrix

When trying to obtain predicted means from an lmer model containing a factor variable, the output varies depending on how the factor variable is specified.

I have a variable agegroup, which can be specified using the groups "Children <15 years", "Adults 15-49 years", "Elderly 50+ years" or "0-15y", "15-49y", "50+y". My choice matters because for the former, the alphabetical ordering of the labels differs from the numeric ordering of the levels. To illustrate this, I have again used the sleep data.

library(lme4)
sleep <- as.data.frame(sleepstudy)   #import the sleep data

I have to create a variable for age.

set.seed(13)  #set a seed for creating a new variable, age
sleep$age <- sample(1:3,length(sleep),rep=TRUE) #create a new variable, age
sleep$agegroup1 <- factor(sleep$age, levels = c(1,2,3), 
        labels = c("Children <15 years", "Adults 15-49 years", "Elderly 50+ years"))
table(sleep$agegroup)  #should have 3 age groups

run the model

m1 <- lmer(Reaction ~ Days + agegroup1 + Days:agegroup1 + (Days | Subject), sleep) 
summary(m1)

# New data frame for predicted means
d <- seq(0,9,1)  # make a vector of days = 0 to 9
newdat1 <- data.frame(Days=d,      
                          agegroup1=factor(rep(levels(sleep$agegroup1),length(d))))
newdat1 <- newdat1[order(newdat1$Days,newdat1$agegroup1),]   #order by Days 
mm <- model.matrix(formula(m1,fixed.only=TRUE)[-2], newdat1)  #create the matrix

Now, I try to output the predicted means using the model matrix and also the predict function:

newdat1$mm <- mm%*%fixef(m1)    
newdat1$predict <- predict(m1, newdata=newdat1, re.form=NA)
head(newdat1)

Here, the predicted means from the model matrix and the predict function are different; the Adults and Children age groups are inverted.

   Days          agegroup1       mm  predict
11    0 Adults 15-49 years 252.2658 252.8241
1     0 Children <15 years 252.8241 252.2658
21    0  Elderly 50+ years 249.1254 249.1254
2     1 Adults 15-49 years 262.3326 263.2674
22    1 Children <15 years 263.2674 262.3326
12    1  Elderly 50+ years 260.0171 260.0171

If I run this script again using factor labels for which the alphabetical ordering is the same as the numeric ordering of the levels, I get different results:

#set new labels for agegroup
sleep$agegroup2 <- factor(sleep$age, levels = c(1,2,3), 
                        labels = c("0-15y", "15-49y", "50+y"))
m2 <- lmer(Reaction ~ Days + agegroup2 + Days:agegroup2 + (Days | Subject), sleep) 
summary(m2)

# New data frame for predicted means
d <- seq(0,9,1)  # make a vector of days = 0 to 9
newdat2 <- data.frame(Days=d,
                    agegroup2=factor(rep(levels(sleep$agegroup2),length(d))))
newdat2 <- newdat2[order(newdat2$Days,newdat2$agegroup2),]   #order by Days
mm <- model.matrix(formula(m2,fixed.only=TRUE)[-2], newdat2)
newdat2$mm <- mm%*%fixef(m2)   
newdat2$predict <- predict(m2, newdata=newdat2, re.form=NA)
head(newdat2)

Here, the predicted means from the model matrix and the predict function are the same.

   Days agegroup2       mm  predict
1     0     0-15y 252.2658 252.2658
11    0    15-49y 252.8241 252.8241
21    0      50+y 249.1254 249.1254
22    1     0-15y 262.3326 262.3326
2     1    15-49y 263.2674 263.2674
12    1      50+y 260.0171 260.0171

Predict appears to ignore the labels and focus on the levels, while directly accessing the model-matrix correctly focusses on the labels. My question, then, is whether it is always necessary to ensure that factor levels and labels have the same order when trying to use the model matrix? Or is there some other way to overcome this problem?


回答1:


The order of columns of the model matrix and of the fixed effects from the model must match in order to correctly do the matrix multiplication to calculate the predicted values "by hand". This means, yes, the order of the levels of the factor in the new dataset must be the same as in the original dataset to use model.matrix and fixef as you did.

You can achieve this by setting the order of the factor levels in your new dataset. This is easiest to do by simply using the levels of the factor from the original dataset. For example, in newdat1 you can do:

factor(rep(levels(sleep$agegroup1), length(d)), levels = levels(sleep$agegroup1)))



来源:https://stackoverflow.com/questions/34346755/predict-and-model-matrix-give-different-predicted-means-within-levels-of-a-facto

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!