How to use non-standard evaluation NSE to evaluate arguments on data.table?

问题

Say I have the following

library(data.table)
cars1 = setDT(copy(cars))
cars2 = setDT(copy(cars))

car_list = list(cars1, cars2)
class(car_list) <- "dd"

`[.dd` <- function(x,...) {
  code = rlang::enquos(...)
  cars1 = x[[1]]
  rlang::eval_tidy(quo(cars1[!!!code]))
}

car_list[,.N, by = speed]

so I wished to perform arbitrary operations on cars1 and cars2 by defining the [.dd function so that whatever I put into ... get executed by cars1 and cars2 using the [ data.table syntax e.g.

car_list[,.N, by = speed] should perform the following

cars1[,.N, by = speed]
cars2[,.N, by = speed]

also I want

car_list[,speed*2]

to do

cars1[,speed*2]
cars2[,speed*2]

Basically, ... in [.dd has to accept arbitrary code.

somehow I need to capture the ... so I tried to do code = rlang::enquos(...) and then rlang::eval_tidy(quo(cars1[!!!code])) doesn't work and gives error

Error in [.data.table(cars1, ~, ~.N, by = ~speed) : argument "i" is missing, with no default

回答1:

First base R option is substitute(...()) followed by do.call:

library(data.table)
cars1 = setDT(copy(cars))
cars2 = setDT(copy(cars))
cars2[, speed := sort(speed, decreasing = TRUE)]

car_list = list(cars1, cars2)
class(car_list) <- "dd"

`[.dd` <- function(x,...) {
  a <- substitute(...()) #this is an alist
  expr <- quote(x[[i]])
  expr <- c(expr, a)
  res <- list()
  for (i in seq_along(x)) {
    res[[i]] <- do.call(data.table:::`[.data.table`, expr)
  }
  res
}

all.equal(
  car_list[,.N, by = speed],
  list(cars1[,.N, by = speed], cars2[,.N, by = speed])
)
#[1] TRUE

all.equal(
  car_list[, speed*2],
  list(cars1[, speed*2], cars2[, speed*2])
)
#[1] TRUE

Second base R option is match.call, modify the call and then evaluate (you find this approach in lm):

`[.dd` <- function(x,...) {
  thecall <- match.call()
  thecall[[1]] <- quote(`[`)
  thecall[[2]] <- quote(x[[i]])
  res <- list()
  for (i in seq_along(x)) {
    res[[i]] <- eval(thecall)
  }
  res
}

all.equal(
  car_list[,.N, by = speed],
  list(cars1[,.N, by = speed], cars2[,.N, by = speed])
)
#[1] TRUE

all.equal(
  car_list[, speed*2],
  list(cars1[, speed*2], cars2[, speed*2])
)
#[1] TRUE

I haven't tested if these approaches will make a deep copy if you use :=.

回答2:

While not under rlang type of mantra, this approach seems to work pretty well: lapply(dt_list, '[', ...) The code would be more readable to me as it is explicit about what method is being used. If I saw car_list[, .N, by = speed] I would expect the default data.table methods.

Making it as a function allows you to have the best of both worlds:

class(car_list) <- "dd"

`[.dd` <- function(x,...) {
 lapply(x, '[', ...)
}

car_list[, .N, speed]
car_list[, speed * 2]
car_list[, .(.N, max(dist)), speed]
car_list[, `:=` (more_speed = speed+5)]

Here are some examples of the approach:

car_list[, .N, speed]
# lapply(car_list, '[', j = .N, by = speed)
# or
# lapply(car_list, '[', , .N, speed)
[[1]]
    speed N
 1:     4 2
 2:     7 2
 3:     8 1
 4:     9 1
 5:    10 3
...
[[2]]
    speed N
 1:     4 2
 2:     7 2
 3:     8 1
 4:     9 1
 5:    10 3
...
car_list[, speed * 2]
# lapply(car_list, '[', j = speed*2)
# or
# lapply(car_list, '[', , speed*2)
[[1]]
 [1]  8  8 14 14 16 18 20 20 20 22 22 24 24 24 24 26 26
[18] 26 26 28 28 28 28 30 30 30 32 32 34 34 34 36 36 36
[35] 36 38 38 38 40 40 40 40 40 44 46 48 48 48 48 50

[[2]]
 [1]  8  8 14 14 16 18 20 20 20 22 22 24 24 24 24 26 26
[18] 26 26 28 28 28 28 30 30 30 32 32 34 34 34 36 36 36
[35] 36 38 38 38 40 40 40 40 40 44 46 48 48 48 48 50

car_list[, .(.N, max(dist)), speed]
# lapply(car_list, '[', j = list(.N, max(dist)), by = speed)
# or 
# lapply(car_list, '[', ,.(.N, max(dist)), speed)

[[1]]
    speed N  V2
 1:     4 2  10
 2:     7 2  22
 3:     8 1  16
 4:     9 1  10
 5:    10 3  34
...

[[2]]
    speed N  V2
 1:     4 2  10
 2:     7 2  22
 3:     8 1  16
 4:     9 1  10
 5:    10 3  34
...

This works with the := operator:

car_list[, `:=` (more_speed = speed+5)]
# or
# lapply(car_list, '[', , `:=` (more_speed = speed+5))

car_list
[[1]]
    speed dist more_speed
 1:     4    2          9
 2:     4   10          9
 3:     7    4         12
 4:     7   22         12
 5:     8   16         13
...

[[2]]
    speed dist more_speed
 1:     4    2          9
 2:     4   10          9
 3:     7    4         12
 4:     7   22         12
 5:     8   16         13

回答3:

The suggestion in my comment wasn't complete. You can indeed use rlang to support tidy evaluation, but since data.table itself doesn't support it directly, you're better off using expressions instead of quosures, and you need to build the complete final expression before calling eval_tidy:

`[.dd` <- function(x, ...) {
  code <- rlang::enexprs(...)
  lapply(x, function(dt) {
    ex <- rlang::expr(dt[!!!code])
    rlang::eval_tidy(ex)
  })
}

car_list[, .N, by = speed]
[[1]]
    speed N
 1:     4 2
 2:     7 2
 3:     8 1
 4:     9 1
 5:    10 3
 6:    11 2
 7:    12 4
 8:    13 4
 9:    14 4
10:    15 3
11:    16 2
12:    17 3
13:    18 4
14:    19 3
15:    20 5
16:    22 1
17:    23 1
18:    24 4
19:    25 1

[[2]]
    speed N
 1:     4 2
 2:     7 2
 3:     8 1
 4:     9 1
 5:    10 3
 6:    11 2
 7:    12 4
 8:    13 4
 9:    14 4
10:    15 3
11:    16 2
12:    17 3
13:    18 4
14:    19 3
15:    20 5
16:    22 1
17:    23 1
18:    24 4
19:    25 1

来源：https://stackoverflow.com/questions/57122960/how-to-use-non-standard-evaluation-nse-to-evaluate-arguments-on-data-table

标签

data.table

nse