using eval in data.table

问题

I'm trying to understand the behaviour of eval in a data.table as a "frame".

With following data.table:

set.seed(1)
foo = data.table(var1=sample(1:3,1000,r=T), var2=rnorm(1000),  var3=sample(letters[1:5],1000,replace = T))

I'm trying to replicate this instruction

foo[var1==1 , sum(var2) , by=var3]

using a function of eval:

eval1 = function(s) eval( parse(text=s) ,envir=sys.parent() )

As you can see, test 1 and 3 are working, but I don't understand which is the "correct" envir to set in eval for test 2:

var_i="var1"
var_j="var2"
var_by="var3"

# test 1 works
foo[eval1(var_i)==1 , sum(var2) , by=var3 ]

# test 2 doesn't work
foo[var1==1 , sum(eval1(var_j)) , by=var3]

# test 3 works
foo[var1==1 , sum(var2) , by=eval1(var_by)]

回答1:

The j-exp, checks for it's variables in the environment of .SD, which stands for Subset of Data. .SD is itself a data.table that holds the columns for that group.

When you do:

foo[var1 == 1, sum(eval(parse(text=var_j))), by=var3]

directly, the j-exp gets internally optimised/replaced to sum(var2). But sum(eval1(var_j)) doesn't get optimised, and stays as it is.

Then when it gets evaluated for each group, it'll have to find var2, which doesn't exist in the parent.frame() from where the function is called, but in .SD. As an example, let's do this:

eval1 <- function(s) eval(parse(text=s), envir=parent.frame())
foo[var1 == 1, { var2 = 1L; eval1(var_j) }, by=var3]
#    var3 V1
# 1:    e  1
# 2:    c  1
# 3:    a  1
# 4:    b  1
# 5:    d  1

It find var2 from it's parent frame. That is, we have to point to the right environment to evaluate in, with an additional argument with value = .SD.

eval1 <- function(s, env) eval(parse(text=s), envir = env, enclos = parent.frame())
foo[var1 == 1, sum(eval1(var_j, .SD)), by=var3]
#    var3         V1
# 1:    e  11.178035
# 2:    c -12.236446
# 3:    a  -8.984715
# 4:    b  -2.739386
# 5:    d  -1.159506

来源：https://stackoverflow.com/questions/26883859/using-eval-in-data-table

标签

data.table