问题
I am programming in R and have a huge correlation matrix. I would like to filter this matrix such that I only have rows and columns containing values >0.7 or <-0.7. I already tried subset and filter but don't really get what I want. The additional problem is that there are so many row/column names that I do not want to work on them. Can anybody please help?
eg
1 2 3 4
1 1 0 0.7 0.6
2 0 1 0.6 0.6
3 0.1 0 1 0.8
4 -0.2 0 0.7 0.9
should return
1 3 4
1 1 0.7 0.6
3 0.1 1 0.8
4 -0.2 0.7 0.9
回答1:
Zero out the diagonal and use apply(..., 1, any)
to find the rows (and therefore also the columns owing to symmetry) which have values >= threshold.
For testing, if cc
is the matrix in the question then we have used cor(cc)
and threshold = 0.6
instead because cc
in the question is not a correlation matrix.
cc <- matrix(c(1, 0, 0.1, -0.2, 0, 1, 0, 0, 0.7, 0.6, 1, 0.7, 0.6, 0.6, 0.8, 0.9), 4)
cc <- cor(cc)
threshold <- 0.6
cc0 <- cc
diag(cc0) <- 0
ok <- apply(abs(cc0) >= threshold, 1, any)
cc[ok, ok]
giving:
[,1] [,2]
[1,] 1.0000000 -0.6375997
[2,] -0.6375997 1.0000000
The last two lines of code could alternately be replaced with this which gets the coordinates of the entries >= threshold using which(..., arr = TRUE)
ix <- sort(unique(c(which(abs(cc0) >= threshold, arr = TRUE))))
cc[ix, ix]
来源:https://stackoverflow.com/questions/47592683/r-filter-correlation-matrix-on-values-and