R cor(), method=“pearson” returns NA, but method=“spearman” returns value. Why?

前端未结

关注

 1  756

萌比男神i 2021-01-16 05:54

I am using R to run correlations on a very large data matrix with approximate dimension 10,000 x 15,000 (events x samples). This data set contains floating point values rang

1条回答

离开以前 (楼主)

2021-01-16 06:01
The Pearson correlation coefficient relies on estimating means and (co)variance. Infinite values lead to infinite means and infinite variances, which break computations. Spearman and Kendall correlation coefficients are rank-based, and thus handle sorting just fine with infinite values (but beware of tied values in your samples!).

Try:
```
> lix <- is.infinite(vector1) | is.infinite(vector2)
> cor(vector1[!lix], vector2[!lix], method = "pearson", use = "pairwise.complete.obs")
```
This just plucks out any pair with infinite values. To do this more generally, a function like this is helpful:
```
> inf2NA <- function(x) { x[is.infinite(x)] <- NA; x }
> cor(inf2NA(vector1), inf2NA(vector2), ...)
```
which just converts infinite values to NAs, and then your use argument can handle those NA cases as you see fit.
0 讨论(0)
发布评论:

提交评论
- 加载中...