Screening (multi)collinearity in a regression model

后端 未结 5 2036
南方客
南方客 2020-12-12 09:49

I hope that this one is not going to be \"ask-and-answer\" question... here goes: (multi)collinearity refers to extremely high correlations between predictors in the regress

5条回答
  •  一生所求
    2020-12-12 10:21

    The kappa() function can help. Here is a simulated example:

    > set.seed(42)
    > x1 <- rnorm(100)
    > x2 <- rnorm(100)
    > x3 <- x1 + 2*x2 + rnorm(100)*0.0001    # so x3 approx a linear comb. of x1+x2
    > mm12 <- model.matrix(~ x1 + x2)        # normal model, two indep. regressors
    > mm123 <- model.matrix(~ x1 + x2 + x3)  # bad model with near collinearity
    > kappa(mm12)                            # a 'low' kappa is good
    [1] 1.166029
    > kappa(mm123)                           # a 'high' kappa indicates trouble
    [1] 121530.7
    

    and we go further by making the third regressor more and more collinear:

    > x4 <- x1 + 2*x2 + rnorm(100)*0.000001  # even more collinear
    > mm124 <- model.matrix(~ x1 + x2 + x4)
    > kappa(mm124)
    [1] 13955982
    > x5 <- x1 + 2*x2                        # now x5 is linear comb of x1,x2
    > mm125 <- model.matrix(~ x1 + x2 + x5)
    > kappa(mm125)
    [1] 1.067568e+16
    > 
    

    This used approximations, see help(kappa) for details.

提交回复
热议问题