Toy R function for solving ordinary least squares by singular value decomposition

99封情书 提交于 2019-12-12 22:17:49

问题


I'm trying to write a functions for multiple regression analysis (y = Xb + e) using a singular value decomposition for matrices. y and X must be the input and regression coefficients vector b, the residual vector e and variance accounted for R2 as output. Beneath is what I have so far and I'm totally stuck. The labels part of the weight also gives me an error. What is this labels part? Can anybody give me some tips to help me proceed?

Test <- function(X, y) {
  x <- t(A) %*% A
  duv <- svd(x)
  x.inv <- duv$v %*% diag(1/duv$d) %*% t(duv$u)
  x.pseudo.inv <- x.inv %*% t(A)
  w <- x.pseudo.inv %*% labels
  return(b, e, R2)
  }

回答1:


You are off the road... Singular value decomposition is applied to model matrix X rather than normal matrix X'X. The following is the correct procedure:

So when writing an R function, we should do

svdOLS <- function (X, y) {
  SVD <- svd(X)
  V <- SVD$v
  U <- SVD$u
  D <- SVD$d
  ## regression coefficients `b`
  ## use `crossprod` for `U'y`
  ## use recycling rule for row rescaling of `U'y` by `D` inverse
  ## use `as.numeric` to return vector instead of matrix
  b <- as.numeric(V %*% (crossprod(U, y) / D))
  ## residuals
  r <- as.numeric(y - X %*% b)
  ## R-squared
  RSS <- crossprod(r)[1]
  TSS <- crossprod(y - mean(y))[1]
  R2 <- 1 - RSS / TSS
  ## multiple return via a list
  list(coefficients = b, residuals = r, R2 = R2)
  }

Let's have a test

## toy data
set.seed(0)
x1 <- rnorm(50); x2 <- rnorm(50); x3 <- rnorm(50); y <- rnorm(50)
X <- model.matrix(~ x1 + x2 + x3)

## fitting linear regression: y ~ x1 + x2 + x3
svdfit <- svdOLS(X, y)

#$coefficients
#[1]  0.14203754 -0.05699557 -0.01256007  0.09776255
#
#$residuals
# [1]  1.327108410 -1.400843739 -0.071885339  2.285661880  0.882041795
# [6] -0.535230752 -0.927750996  0.262674650 -0.133878558 -0.559783412
#[11]  0.264204296 -0.581468657  2.436913000  1.517601798  0.774515419
#[16]  0.447774149 -0.578988327  0.664690723 -0.511052627 -1.233302697
#[21]  1.740216739 -1.065592673 -0.332307898 -0.634125164 -0.975142054
#[26]  0.344995480 -1.748393187 -0.414763742 -0.680473508 -1.547232557
#[31] -0.383685601 -0.541602452 -0.827267878  0.894525453  0.359062906
#[36] -0.078656943  0.203938750 -0.813745178 -0.171993018  1.041370294
#[41] -0.114742717  0.034045040  1.888673004 -0.797999080  0.859074345
#[46]  1.664278354 -1.189408794  0.003618466 -0.527764821 -0.517902581
#
#$R2
#[1] 0.008276773

On the other hand, we can use .lm.fit to check correctness:

qrfit <- .lm.fit(X, y)

which is exactly the same on coefficients and residuals:

all.equal(svdfit$coefficients, qrfit$coefficients)
# [1] TRUE

all.equal(svdfit$residuals, qrfit$residuals)
# [1] TRUE


来源:https://stackoverflow.com/questions/40250072/toy-r-function-for-solving-ordinary-least-squares-by-singular-value-decompositio

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!