data.table join and j-expression unexpected behavior

后端 未结 4 1609
无人及你
无人及你 2020-12-17 03:41

In R 2.15.0 and data.table 1.8.9:

d = data.table(a = 1:5, value = 2:6, key = \"a\")

d[J(3), value]
#   a value
#   3     4

d[J(3)         


        
4条回答
  •  佛祖请我去吃肉
    2020-12-17 04:26

    I agree with Arun's answer. Here's another wording: After you do a join, you often will use the join column as a reference or as an input to further transformation. So you keep it, and you have an option to discard it with the (more roundabout) double [ syntax. From a design perspective, it is easier to keep frequently relevant information and then discard when desired, than to discard early and risk losing data that is difficult to reconstruct.

    Another reason that you'd want to keep the join column is that you can perform aggregate operations at the same time as you perform a join (the by without by). For example, the results here are much clearer by including the join column:

    d <- data.table(a=rep.int(1:3,2),value=2:7,other=100:105,key="a")
    d[J(1:3),mean(value)]
    #   a  V1
    #1: 1 3.5
    #2: 2 4.5
    #3: 3 5.5
    

提交回复
热议问题