I am writing a custom aggregation function with data.table (v 1.9.6) and struggle to pass function arguments to it. there have been similar questions on this but none deals
Here's an option using mget, as commented:
fn_agg <- function(DT, var_list, var_name_list, by_var_list, order_var_list) {
temp <- DT[, setNames(lapply(.SD, sum, na.rm = TRUE), var_name_list),
by = by_var_list, .SDcols = var_list]
setorderv(temp, order_var_list)
cols1 <- paste0(var_name_list, "_del")
cols2 <- paste0(cols1, "_rel")
temp[, (cols1) := lapply(mget(var_name_list), function(x) {
x - shift(x, n = 1, type = "lag")
})]
temp[, (cols2) := lapply(mget(var_name_list), function(x) {
xshift <- shift(x, n = 1, type = "lag")
(x - xshift) / xshift
})]
temp[]
}
fn_agg(dt,
var_list = c("x", "y"),
var_name_list = c("x_sum", "y_sum"),
by_var_list = c("a", "b"),
order_var_list = c("a", "b"))
# a b x_sum y_sum x_sum_del y_sum_del x_sum_del_rel y_sum_del_rel
#1: a e 254 358 NA NA NA NA
#2: b f 246 116 -8 -242 -0.031496063 -0.6759777
#3: c g 272 242 26 126 0.105691057 1.0862069
#4: d h 273 194 1 -48 0.003676471 -0.1983471
Instead of mget, you could also make use of data.table's .SDcols argument as in
temp[, (cols1) := lapply(.SD, function(x) {
x - shift(x, n = 1, type = "lag")
}), .SDcols = var_name_list]
Also, there are probably ways to improve the function by avoiding duplicated computation of shift(x, n = 1, type = "lag") but I only wanted to demonstrate a way to use data.table in functions.
Looks like a question to me :)
I prefer computing on the language over get/mget.
fn_agg = function(dt, var_list, var_name_list, by_var_list, order_var_list) {
j_call = as.call(c(
as.name("."),
sapply(setNames(var_list, var_name_list), function(var) as.call(list(as.name("sum"), as.name(var), na.rm=TRUE)), simplify=FALSE)
))
order_call = as.call(c(
as.name("order"),
lapply(order_var_list, as.name)
))
j2_call = as.call(c(
as.name(":="),
c(
sapply(setNames(var_name_list, paste0(var_name_list,"_del")), function(var) {
substitute(.var - shift(x = .var, n = 1, type = "lag"), list(.var=as.name(var)))
}, simplify=FALSE),
sapply(setNames(var_name_list, paste0(var_name_list,"_del_rel")), function(var) {
substitute((.var - shift(x = .var, n = 1, type = "lag")) / (shift(x = .var, n = 1, type = "lag")), list(.var=as.name(var)))
}, simplify=FALSE)
)
))
dt[eval(order_call), eval(j_call), by=by_var_list
][, eval(j2_call)
][]
}
ans = fn_agg(dt, var_list=c("x","y"), var_name_list=c("x_sum","y_sum"), by_var_list=c("a","b"), order_var_list=c("a","b"))
all.equal(temp2, ans)
#[1] TRUE
Some extra notes:
_del in step2 and _del_rel in step3.order variables is always the same as by variables you can put them into keyby argument.