R cor(), method=“pearson” returns NA, but method=“spearman” returns value. Why?

前端 未结 1 756
萌比男神i
萌比男神i 2021-01-16 05:54

I am using R to run correlations on a very large data matrix with approximate dimension 10,000 x 15,000 (events x samples). This data set contains floating point values rang

1条回答
  •  离开以前
    2021-01-16 06:01

    The Pearson correlation coefficient relies on estimating means and (co)variance. Infinite values lead to infinite means and infinite variances, which break computations. Spearman and Kendall correlation coefficients are rank-based, and thus handle sorting just fine with infinite values (but beware of tied values in your samples!).

    Try:

    > lix <- is.infinite(vector1) | is.infinite(vector2)
    > cor(vector1[!lix], vector2[!lix], method = "pearson", use = "pairwise.complete.obs")
    

    This just plucks out any pair with infinite values. To do this more generally, a function like this is helpful:

    > inf2NA <- function(x) { x[is.infinite(x)] <- NA; x }
    > cor(inf2NA(vector1), inf2NA(vector2), ...)
    

    which just converts infinite values to NAs, and then your use argument can handle those NA cases as you see fit.

    0 讨论(0)
提交回复
热议问题