Correlation between groups in R data.table

前端 未结 3 1544
南笙
南笙 2021-01-02 03:59

Is there a way of elegantly calculating the correlations between values if those values are stored by group in a single column of a data.table (other than converting the dat

3条回答
  •  一向
    一向 (楼主)
    2021-01-02 04:13

    I've since found an even simple alternative for doing this. You were actually pretty close with your dt[, cor(value, value), by="group"] approach. What you actually need is to first do a Cartesian join on the dates, and then group by. I.e.

    dt[dt, allow.cartesian=T][, cor(value, value), by=list(group, group.1)]
    

    This has the advantage that it will join the series together (rather than assume they are the same length). You can then cast this into matrix form, or leave it as it is to plot as a heatmap in ggplot etc.

    Full Example

    setkey(dt, id)
    c <- dt[dt, allow.cartesian=T][, list(Cor = cor(value, value.1)), by = list(group, group.1)]
    c
    
       group group.1       Cor
    1:     a       a 1.0000000
    2:     b       a 0.1556371
    3:     a       b 0.1556371
    4:     b       b 1.0000000
    
    dcast(c, group~group.1, value.var = "Cor")
    
      group         a         b
    1     a 1.0000000 0.1556371
    2     b 0.1556371 1.0000000
    

提交回复
热议问题