Using `on` and `by` to compute a new variable from two data.tables

血红的双手。 提交于 2020-01-11 12:40:24

问题


How come I cannot use by when computing a new variable by from two data.tables following a merge?

Example datasets:

library(data.table)
set.seed(1)

# Example datasets.
dt1 <- data.table(id=1:10,
                  var=rnorm(10))

dt2 <- data.table(id=c(2, 4, 5, 6, 8),
                  color=sample(1:2, 5, replace=TRUE),
                  group=sample(c("a", "b"), 5, replace=TRUE))

# Join on ID.
dt1[dt2, on="id"]

#    id        var     newVar color group
# 1:  2  0.1836433  0.3672866     2     a
# 2:  4  1.5952808  1.5952808     1     a
# 3:  5  0.3295078  0.6590155     2     a
# 4:  6 -0.8204684 -0.8204684     1     b
# 5:  8  0.7383247  0.7383247     1     a

It seems group is available as a variable after the join. Now compute new variable from dt1 and dt2 variables (using by).

dt1[dt2, mean(var*color), on="id", by="group"]
# Error in eval(expr, envir, enclos) : object 'group' not found

Doesn't work because group is not found, even though var and color are visible and come from different datasets? This works:

dt1[dt2, mean(var*color), on="id"]
# [1] 0.5078879

Why is color from dt2 available for computing a new variable, but group, also from dt2, is not? I've tried with a modified example where group is in dt1, but then color is not found.

来源:https://stackoverflow.com/questions/38824705/using-on-and-by-to-compute-a-new-variable-from-two-data-tables

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!