问题
I'm trying to understand the behaviour of eval in a data.table as a "frame".
With following data.table:
set.seed(1)
foo = data.table(var1=sample(1:3,1000,r=T), var2=rnorm(1000), var3=sample(letters[1:5],1000,replace = T))
I'm trying to replicate this instruction
foo[var1==1 , sum(var2) , by=var3]
using a function of eval:
eval1 = function(s) eval( parse(text=s) ,envir=sys.parent() )
As you can see, test 1 and 3 are working, but I don't understand which is the "correct" envir to set in eval for test 2:
var_i="var1"
var_j="var2"
var_by="var3"
# test 1 works
foo[eval1(var_i)==1 , sum(var2) , by=var3 ]
# test 2 doesn't work
foo[var1==1 , sum(eval1(var_j)) , by=var3]
# test 3 works
foo[var1==1 , sum(var2) , by=eval1(var_by)]
回答1:
The j-exp
, checks for it's variables in the environment of .SD
, which stands for Subset of Data
. .SD
is itself a data.table
that holds the columns for that group.
When you do:
foo[var1 == 1, sum(eval(parse(text=var_j))), by=var3]
directly, the j-exp
gets internally optimised/replaced to sum(var2)
. But sum(eval1(var_j))
doesn't get optimised, and stays as it is.
Then when it gets evaluated for each group, it'll have to find var2
, which doesn't exist in the parent.frame() from where the function is called, but in .SD
. As an example, let's do this:
eval1 <- function(s) eval(parse(text=s), envir=parent.frame())
foo[var1 == 1, { var2 = 1L; eval1(var_j) }, by=var3]
# var3 V1
# 1: e 1
# 2: c 1
# 3: a 1
# 4: b 1
# 5: d 1
It find var2
from it's parent frame. That is, we have to point to the right environment to evaluate in, with an additional argument with value = .SD
.
eval1 <- function(s, env) eval(parse(text=s), envir = env, enclos = parent.frame())
foo[var1 == 1, sum(eval1(var_j, .SD)), by=var3]
# var3 V1
# 1: e 11.178035
# 2: c -12.236446
# 3: a -8.984715
# 4: b -2.739386
# 5: d -1.159506
来源:https://stackoverflow.com/questions/26883859/using-eval-in-data-table