Transform Correlation Matrix into dataframe with records for each row column pair

倖福魔咒の 提交于 2020-01-01 04:54:08

问题


I have a large matrix of correlations (1093 x 1093). I'm trying my matrix into a dataframe that has a column for every row and column pair, so it would (1093)^2 records.

Here's a snippet of my matrix

            60516        45264        02117
60516  1.00000000 -0.370793012 -0.082897941
45264 -0.37079301  1.000000000  0.005145601
02117 -0.08289794  0.005145601  1.000000000

The goal from here would be to have a dataframe that looks like this:

row column correlation
60516 60516 1.000000000
60516 45264 -0.370793012

........ and so on.

Anyone have any tips? Let me know if I can clarify anything

Thanks, Ben


回答1:


For matrix m, you could do:

data.frame(row=rownames(m)[row(m)], col=colnames(m)[col(m)], corr=c(m))

#     row   col         corr
# 1 60516 60516  1.000000000
# 2 45264 60516 -0.370793010
# 3 02117 60516 -0.082897940
# 4 60516 45264 -0.370793012
# 5 45264 45264  1.000000000
# 6 02117 45264  0.005145601
# 7 60516 02117 -0.082897941
# 8 45264 02117  0.005145601
# 9 02117 02117  1.000000000

But if your matrix is symmetrical and if you are not interested in the diagonal, then you can simplify it to:

data.frame(row=rownames(m)[row(m)[upper.tri(m)]], 
           col=colnames(m)[col(m)[upper.tri(m)]], 
           corr=m[upper.tri(m)])

#     row   col         corr
# 1 60516 45264 -0.370793012
# 2 60516 02117 -0.082897941
# 3 45264 02117  0.005145601



回答2:


The following should work. Given a correlation matrix Acor:

You can create the data.frame as:

UpperT <- Acor[upper.tri(Acor, diag = TRUE)]
n <- dim(Acor)[[1]]
Row <- unlist(lapply(seq_len(n), FUN = seq_len))
Column <- rep(seq_len(n), seq_len(n))
Df <- data.frame(UpperT, Row, Column)

For example, with correlation matrix:

set.seed(24)
A <- matrix(rnorm(25, 5, 2), ncol = 5)
Acor <- cor(A)

> Acor
           [,1]       [,2]       [,3]       [,4]       [,5]
[1,]  1.0000000  0.3398424  0.8876580  0.2582569 -0.5699901
[2,]  0.3398424  1.0000000  0.5897580 -0.7416699  0.2502752
[3,]  0.8876580  0.5897580  1.0000000 -0.1631381 -0.2101108
[4,]  0.2582569 -0.7416699 -0.1631381  1.0000000 -0.8067492
[5,] -0.5699901  0.2502752 -0.2101108 -0.8067492  1.0000000

You get:

> Df
       UpperT Row Column
1   1.0000000   1      1
2   0.3398424   1      2
3   1.0000000   2      2
4   0.8876580   1      3
5   0.5897580   2      3
6   1.0000000   3      3
7   0.2582569   1      4
8  -0.7416699   2      4
9  -0.1631381   3      4
10  1.0000000   4      4
11 -0.5699901   1      5
12  0.2502752   2      5
13 -0.2101108   3      5
14 -0.8067492   4      5
15  1.0000000   5      5


来源:https://stackoverflow.com/questions/28035001/transform-correlation-matrix-into-dataframe-with-records-for-each-row-column-pai

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!