R: Filter correlation matrix on values > and < [closed]

若如初见. 提交于 2019-12-08 10:25:14

问题


I am programming in R and have a huge correlation matrix. I would like to filter this matrix such that I only have rows and columns containing values >0.7 or <-0.7. I already tried subset and filter but don't really get what I want. The additional problem is that there are so many row/column names that I do not want to work on them. Can anybody please help?

eg

  1    2  3   4  
1 1    0  0.7 0.6  
2 0    1  0.6 0.6  
3 0.1  0  1   0.8  
4 -0.2 0  0.7 0.9  

should return

  1    3    4   
1 1    0.7  0.6  
3 0.1  1    0.8  
4 -0.2 0.7  0.9

回答1:


Zero out the diagonal and use apply(..., 1, any) to find the rows (and therefore also the columns owing to symmetry) which have values >= threshold.

For testing, if cc is the matrix in the question then we have used cor(cc) and threshold = 0.6 instead because cc in the question is not a correlation matrix.

cc <- matrix(c(1, 0, 0.1, -0.2, 0, 1, 0, 0, 0.7, 0.6, 1, 0.7, 0.6, 0.6, 0.8, 0.9), 4)
cc <- cor(cc)

threshold <- 0.6
cc0 <- cc
diag(cc0) <- 0
ok <- apply(abs(cc0) >= threshold, 1, any)
cc[ok, ok]

giving:

           [,1]       [,2]
[1,]  1.0000000 -0.6375997
[2,] -0.6375997  1.0000000

The last two lines of code could alternately be replaced with this which gets the coordinates of the entries >= threshold using which(..., arr = TRUE)

ix <- sort(unique(c(which(abs(cc0) >= threshold, arr = TRUE))))
cc[ix, ix]


来源:https://stackoverflow.com/questions/47592683/r-filter-correlation-matrix-on-values-and

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!