Distributed cross correlation matrix computation

前端 未结 2 675
甜味超标
甜味超标 2021-01-12 02:37

How can I calculate pearson cross correlation matrix of large (>10TB) data set, possibly in distributed manner ? Any efficient distributed algorithm suggestion will be ap

2条回答
  •  没有蜡笔的小新
    2021-01-12 03:29

    To start with, have a look at this to see if things are going right. You may then refer to any of these implementations: MPI/OpenMP: Agomezl or Meismyles, MapReduce: Vangjee or Seawolf42. It'd also be interesting to read this before you proceed. On a different note, James's thesis provides some pointers if you're interested in computing the correlations that are robust to outliers.

提交回复
热议问题