data.table is a fantastic R package and I am using it in a library I am developing. So far all is going very well, except for one complication. It seems to be m
Maybe you know about this solution already?
DT[[colname]]
This is inspired by @eddi's solution in the comments below, using the OP's example:
set.seed(1)
x = data.table(a = 1:10, b=rnorm(10))
colstr="b"
col <- eval(parse(text=paste("quote(",colstr,")",sep="")))
x[eval(col)<0]
x[eval(col)<0,c(colstr):=-100]
eval is definitely not a recommended approach to subset a data.table using dynamically saved variables. The following example will help:
# Toy data.table example
DT = data.table(a = c(1,2,3), b = c(4,5,6))
# Saved variable
mVar <- "a"
# Subset
DT[DT[[mVar]] < 2]
eval is very sensitive to complex character expressions and generally not recommended for production code.
Say you have the column name in variable x, you could do
colname = as.name(x)
you can then use colname in the subset function
If you are going to be doing complicated operations inside your j expressions, you should probably use eval and quote. One problem with that in current version of data.table is that the environment of eval is not always correctly processed - eval and quote in data.table (Note: There has been an update to that answer based on an update to the package.) - and the current fix for that is to add .SD to eval. As far as I can tell from a few tests that I've run this doesn't affect speed (the way e.g. having .SD[1] in j would).
Interestingly this issue only plagues the j and you'll be fine using eval normally in i (where .SD is not available anyway).
The other problem is assignment, and there you have to have strings. I know one way to extract the string name from a quoted expression - it's not pretty, but it works. Here's an example combining everything together:
x = data.table(dist = c(1:10), val = c(1:10))
distcol = quote(dist)
valcol = quote(val)
x[eval(valcol) < 5,
capture.output(str(distcol, give.head = F)) := eval(distcol)*sum(eval(distcol, .SD))]
Note how I was ok not adding .SD in one eval(distcol), but won't be if I take it out of the other eval.
Another option is to use get:
diststr = "dist"
valstr = "val"
x[get(valstr) < 5, c(diststr) := get(diststr)*sum(get(diststr))]