Why does `fastLm()` return results when I run a regression with one observation?

荒凉一梦 提交于 2019-12-02 03:30:01

Because fastLm doesn't worry about rank-deficiency; this is part of the price you pay for speed.

From ?fastLm:

... The reason that Armadillo can do something like lm.fit faster than the functions in the stats package is because Armadillo uses the Lapack version of the QR decomposition while the stats package uses a modified Linpack version. Hence Armadillo uses level-3 BLAS code whereas the stats package uses level-1 BLAS. However, Armadillo will either fail or, worse, produce completely incorrect answers on rank-deficient model matrices whereas the functions from the stats package will handle them properly due to the modified Linpack code.

Looking at the code here, the guts of the code are

 arma::colvec coef = arma::solve(X, y);

This does a QR decomposition. We can match the lmFast results with qr() from base R (here I am not using only base R constructs rather than relying on data.table):

set.seed(1)
dd <- data.frame(y = rnorm(5), 
      x1 = rnorm(5), x2 = rnorm(5), my.key = 1:5)

X <- model.matrix(~1+x1+x2, data=subset(dd,my.key==1))
qr(X,dd$y)
## $qr
##   (Intercept)         x1       x2
## 1           1 -0.8204684 1.511781

You can look at the code of lm.fit() to see what R does about rank deficiency when fitting linear models; the underlying BLAS algorithm it calls does QR with pivoting ...

If you want to flag these situations, I think that Matrix::rankMatrix() will do the trick:

library(Matrix)
rankMatrix(X) < ncol(X)  ## TRUE
X1 <- model.matrix(~1+x1+x2, data=dd)
rankMatrix(X1) < ncol(X1)  ## FALSE
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!