How to properly set contrasts in R

浪子不回头ぞ 提交于 2019-12-06 04:01:21

问题


I have been asked to see if there is a linear trend in 3 groups of data (5 points each) by using ANOVA and linear contrasts. The 3 groups represent data collected in 2010, 2011 and 2012. I want to use R for this procedure and I have tried both of the following:

contrasts(data$groups, how.many=1) <- contr.poly(3)
contrasts(data$groups)  <- contr.poly(3)

Both ways seem to work fine but give slightly different answers in terms of their p-values. I have no idea which is correct and it is really tricky to find help for this on the web. I would like help figuring out what is the reasoning behind the different answers. I'm not sure if it has something to do with partitioning sums of squares or whatnot.


回答1:


Both approaches differ with respect to whether a quadratic polynomial is used.

For illustration purposes, have a look at this example, both x and y are a factor with three levels.

x <- y <- gl(3, 2)
# [1] 1 1 2 2 3 3
# Levels: 1 2 3

The first approach creates a contrast matrix for a quadratic polynomial, i.e., with a linear (.L) and a quadratic trend (.Q). The 3 means: Create the 3 - 1th polynomial.

contrasts(x) <- contr.poly(3)
# [1] 1 1 2 2 3 3
# attr(,"contrasts")
#              .L         .Q
# 1 -7.071068e-01  0.4082483
# 2 -7.850462e-17 -0.8164966
# 3  7.071068e-01  0.4082483
# Levels: 1 2 3

In contrast, the second approach results in a polynomial of first order (i.e., a linear trend only). This is due to the argument how.many = 1. Hence, only 1 contrast is created.

contrasts(y, how.many = 1) <- contr.poly(3)
# [1] 1 1 2 2 3 3
# attr(,"contrasts")
#              .L
# 1 -7.071068e-01
# 2 -7.850462e-17
# 3  7.071068e-01
# Levels: 1 2 3

If you're interested in the linear trend only, the second option seems more appropriate for you.




回答2:


Changing the contrasts you ask for changes the degrees of freedom of the model. If one model requests linear and quadratic contrasts, and a second specifies only, say, the linear contrast, then the second model has an extra degree of freedom: this will increase the power to test the linear hypothesis, (at the cost of preventing the model fitting the quadratic trend).

Using the full ("nlevels - 1") set of contrasts creates an orthogonal set of contrasts which explore the full set of (independent) response configurations. Cutting back to just one prevents the model from fitting one configuration (in this case the quadratic component which our data in fact possess.

To see how this works, use the built-in dataset mtcars, and test the (confounded) relationship of gears to gallons. We'll hypothesize that the more gears the better (at least up to some point).

df = mtcars # copy the dataset
df$gear = as.ordered(df$gear) # make an ordered factor

Ordered factors default to polynomial contrasts, but we'll set them here to be explicit:

contrasts(df$gear) <- contr.poly(nlevels(df$gear))

Then we can model the relationship.

m1 = lm(mpg ~ gear, data = df);
summary.lm(m1)
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  20.6733     0.9284  22.267  < 2e-16 ***
# gear.L        3.7288     1.7191   2.169  0.03842 *  
# gear.Q       -4.7275     1.4888  -3.175  0.00353 ** 
# 
# Multiple R-squared:  0.4292,  Adjusted R-squared:  0.3898 
# F-statistic:  10.9 on 2 and 29 DF,  p-value: 0.0002948

Note we have F(2,29) = 10.9 for the overall model and p=.038 for our linear effect with an estimated extra 3.7 mpg/gear.

Now let's only request the linear contrast, and run the "same" analysis.

contrasts(df$gear, how.many = 1) <- contr.poly(nlevels(df$gear))
m1 = lm(mpg ~ gear, data = df)
summary.lm(m1)
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)   21.317      1.034  20.612   <2e-16 ***
# gear.L         5.548      1.850   2.999   0.0054 ** 
# Multiple R-squared:  0.2307,  Adjusted R-squared:  0.205 
# F-statistic: 8.995 on 1 and 30 DF,  p-value: 0.005401

The linear effect of gear is now bigger (5.5 mpg) and p << .05 - A win? Except the overall model fit is now significantly worse: variance accounted for is now just 23% (was 43%)! Why is clear if we plot the relationship:

plot(mpg ~ gear, data = df) # view the relationship

So, if you're interested in the linear trend, but also expect (or are unclear about) additional levels of complexity, you should also test these higher polynomials. The quadratic (or, in general, trends up to levels-1).

Note too that in this example the physical mechanism is confounded: We've forgotten that number of gears is confounded with automatic vs manual transmission, and also with weight, and sedan vs sports car.

If someone wants to test the hypothesis that 4 gears is better than 3, they could answer this question :-)



来源:https://stackoverflow.com/questions/21770610/how-to-properly-set-contrasts-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!