Weighted Pearson's Correlation?

寵の児 提交于 2019-11-28 08:29:05

You can go back to the definition of the correlation.

f <- function( x, y, w = rep(1,length(x))) {
  stopifnot( length(x) == dim(y)[2] )
  w <- w / sum(w)
  # Center x and y, using the weighted means
  x <- x - sum(x*w)
  y <- y - apply( t(y) * w, 2, sum )
  # Compute the variance
  vx <- sum( w * x * x )
  vy <- rowSums( w * y * y ) # Incorrect: see Heather's remark, in the other answer
  # Compute the covariance
  vxy <- colSums( t(y) * x * w )
  # Compute the correlation
  vxy / sqrt(vx * vy)
}
f(x,y)[1]
cor(x,y[1,]) # Identical
f(x, y, xy.wt)

Unfortunately the accepted answer is wrong when y is a matrix of more than one row. The error is in the line

vy <- rowSums( w * y * y )

We want to multiply the columns of y by w, but this will multiply the rows by the elements of w, recycled as necessary. Thus

> f(x, y[1, , drop = FALSE], xy.wt)
[1] 0.103021

is correct, because in this case the multiplication is performed element-wise, which is equivalent to column-wise multiplication here, but

> f(x, y, xy.wt)[1]
[1] 0.05463575

gives a wrong answer due to the row-wise multiplication.

We can correct the function as follows

f2 <- function( x, y, w = rep(1,length(x))) {
  stopifnot(length(x) == dim(y)[2] )
  w <- w / sum(w)
  # Center x and y, using the weighted means
  x <- x - sum(x * w)
  ty <- t(y - colSums(t(y) * w))
  # Compute the variance
  vx <- sum(w * x * x)
  vy <- colSums(w * ty * ty)
  # Compute the covariance
  vxy <- colSums(ty * x * w)
  # Compute the correlation
  vxy / sqrt(vx * vy)
}

and check the results against those produced by corr from the boot package:

> res1 <- f2(x, y, xy.wt)
> res2 <- sapply(1:nrow(y), 
+                function(i, x, y, w) corr(cbind(x, y[i,]), w = w),
+                x = x, y = y, w = xy.wt)
> all.equal(res1, res2)
[1] TRUE

which in itself gives another way that this problem could be solved.

Here is a generalization to compute the weighted Pearson correlation between two matrices (instead of a vector and a matrix, as in the original question):

matrix.corr <- function (a, b, w = rep(1, nrow(a))/nrow(a)) 
{
    # normalize weights
    w <- w / sum(w)

    # center matrices
    a <- sweep(a, 2, colSums(a * w))
    b <- sweep(b, 2, colSums(b * w))

    # compute weighted correlation
    t(w*a) %*% b / sqrt( colSums(w * a**2) %*% t(colSums(w * b**2)) )
}

Using the above example and the correlation function from Heather, we can verify it:

> sum(matrix.corr(as.matrix(x, nrow=34),t(y),xy.wt) - f2(x,y,xy.wt))
[1] 1.537507e-15

In terms of calling syntax, this resembles the unweighted cor:

> a <- matrix( c(1,2,3,1,3,2), nrow=3)
> b <- matrix( c(2,3,1,1,7,3,5,2,8,1,10,12), nrow=3)
> matrix.corr(a,b)
     [,1]      [,2] [,3]      [,4]
[1,] -0.5 0.3273268  0.5 0.9386522
[2,]  0.5 0.9819805 -0.5 0.7679882
> cor(a, b)
     [,1]      [,2] [,3]      [,4]
[1,] -0.5 0.3273268  0.5 0.9386522
[2,]  0.5 0.9819805 -0.5 0.7679882
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!