Compute covariance matrix from list of occurrences [duplicate]

主宰稳场 提交于 2019-12-25 17:14:11

问题


I have the following data frame:

# my_data
id  cg
1   a
2   b
3   a
3   b
4   b
4   c
5   b
5   c
5   d
6   d

I would like to compute the covariance of the values of cg. I believe I can obtain it by using cov() on the following matrix, where every cell counts the number of co-occurrences between two values of cg.

# my_matrix
cg  a  b  c  d
a   2  1  0  0
b   1  4  2  1
c   0  2  2  1
d   0  1  1  2

What is the quickest way to go from my_data to my_matrix? Please be aware that cg contains more than 700 unique values.

If there is a better way to generate the covariance matrix, I am also interested in that.

Here is the code to generate my_data:

my_data <- structure(list(id = c(1L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 5L, 6L),
                          cg = c("a", "b", "a", "b", "b", "c", "b", "c", "d", "d")),
                     .Names = c("id", "cg"),
                     class = "data.frame", row.names = c(NA, -10L))

回答1:


We can use crossprod with table

crossprod(table(my_data))
#    cg
#cg  a b c d
#  a 2 1 0 0
#  b 1 4 2 1
#  c 0 2 2 1
#  d 0 1 1 2


来源:https://stackoverflow.com/questions/43631179/compute-covariance-matrix-from-list-of-occurrences

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!