Distributed cross correlation matrix computation

前端未结

关注

 2  687

甜味超标 2021-01-12 02:37

How can I calculate pearson cross correlation matrix of large (>10TB) data set, possibly in distributed manner ? Any efficient distributed algorithm suggestion will be ap

2条回答

没有蜡笔的小新 (楼主)

2021-01-12 03:29

To start with, have a look at this to see if things are going right. You may then refer to any of these implementations: MPI/OpenMP: Agomezl or Meismyles, MapReduce: Vangjee or Seawolf42. It'd also be interesting to read this before you proceed. On a different note, James's thesis provides some pointers if you're interested in computing the correlations that are robust to outliers.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...