R missing levels in a model.matrix

拥有回忆 提交于 2019-12-02 00:58:56

问题


I am trying to convert a data frame with categorical variables to a model.matrix but am losing levels of variables.

Here's my code:

df1 <- data.frame(id = 1:200, y =rbinom(200, 1, .5),  var1 = factor(rep(c('abc','def','ghi','jkl'),50)))
df1$var2 <- factor(rep(c('ab c','ghi','jkl','def'),50))
df1$var3 <- factor(rep(c('abc','ghi','nop','xyz'),50))

df1$var2 <- as.character(df1$var2)
df1$var2 <- gsub('\\s','',df1$var2)
df1$var2 <- factor(df1$var2)
sapply(df1, levels)

mm1 <- model.matrix(~ 0+.,df1)
head(mm1)

Any suggestions? Is this a matrix non-invertability issue?


回答1:


The model matrix is perfectly correct. For factors, the model matrix contains one column less than there are factors: this information is already contained in the (Intercept) column. You are missing this column because you have specified +0 in your model term. Try this:

mm2 <- model.matrix(~., df1)
head(mm2)

You will now see the (Intercept) column which encodes "default" information, and now also the first level of var1 is missing in the column names. The (Intercept) represents your observation at the "reference level", which is the combination of first level of each categorical attribute. Any deviation from this reference level is encoded in the var*??? columns, and since your model assumes no interactions between these columns, you get (4 - 1) * 3 var*??? columns plus the (Intercept) column (which is replaced by var1abc in your initial model matrix).

Unfortunately I lack the precise terms to describe this. Anyone help me out?



来源:https://stackoverflow.com/questions/17281827/r-missing-levels-in-a-model-matrix

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!