问题
How come I cannot use by
when computing a new variable by from two data.tables
following a merge?
Example datasets:
library(data.table)
set.seed(1)
# Example datasets.
dt1 <- data.table(id=1:10,
var=rnorm(10))
dt2 <- data.table(id=c(2, 4, 5, 6, 8),
color=sample(1:2, 5, replace=TRUE),
group=sample(c("a", "b"), 5, replace=TRUE))
# Join on ID.
dt1[dt2, on="id"]
# id var newVar color group
# 1: 2 0.1836433 0.3672866 2 a
# 2: 4 1.5952808 1.5952808 1 a
# 3: 5 0.3295078 0.6590155 2 a
# 4: 6 -0.8204684 -0.8204684 1 b
# 5: 8 0.7383247 0.7383247 1 a
It seems group
is available as a variable after the join. Now compute new variable from dt1
and dt2
variables (using by
).
dt1[dt2, mean(var*color), on="id", by="group"]
# Error in eval(expr, envir, enclos) : object 'group' not found
Doesn't work because group
is not found, even though var
and color
are visible and come from different datasets? This works:
dt1[dt2, mean(var*color), on="id"]
# [1] 0.5078879
Why is color
from dt2
available for computing a new variable, but group
, also from dt2
, is not? I've tried with a modified example where group
is in dt1
, but then color
is not found.
来源:https://stackoverflow.com/questions/38824705/using-on-and-by-to-compute-a-new-variable-from-two-data-tables