Correlation between groups in R data.table

前端未结

关注

 3  1545

南笙 2021-01-02 03:59

Is there a way of elegantly calculating the correlations between values if those values are stored by group in a single column of a data.table (other than converting the dat

3条回答

一向 (楼主)

2021-01-02 04:31

I don't know a way to get it in matrix form straight away, but I find this solution useful:

dt[, {x = value; dt[, cor(x, value), by = group]}, by=group]

   group group        V1
1:     a     a 1.0000000
2:     a     b 0.1556371
3:     b     a 0.1556371
4:     b     b 1.0000000

since you started with a molten dataset and you end up with a molten representation of the correlation.

Using this form you can also choose to just calculate certain pairs, in particular it is a waste of time calculating both off diagonals. For example:

 dt[, {x = value; g = group; dt[group <= g, list(cor(x, value)), by = group]}, by=group]
   group group        V1
1:     a     a 1.0000000
2:     b     a 0.1556371
3:     b     b 1.0000000

Alternatively, this form works just as well for the cross correlation between two sets (i.e. the block off diagonal)

library(data.table)
set.seed(1)             # reproducibility
dt1 <- data.table(id=1:4, group=rep(letters[1:2], c(4,4)), value=rnorm(8))
dt2 <- data.table(id=1:4, group=rep(letters[3:4], c(4,4)), value=rnorm(8))
setkey(dt1, group)
setkey(dt2, group)

dt1[, {x = value; g = group; dt2[, list(cor(x, value)), by = group]}, by=group]

   group group          V1
1:     a     c -0.39499814
2:     a     d  0.74234458
3:     b     c  0.96088312
4:     b     d  0.08016723

Obviously, if you ultimately want these in matrix form, then you can use dcast or dcast.data.table, however, notice that in the above examples you have two columns with the same name, to fix this it is worth renaming them in the j function. For the original problem:

dcast.data.table(dt[, {x = value; g1=group; dt[, list(g1, g2=group, c =cor(x, value)), by = group]}, by=group], g1~g2, value.var = "c")

   g1         a         b
1:  a 1.0000000 0.1556371
2:  b 0.1556371 1.0000000

0 讨论(0)

查看其它3个回答