Keyed lookup on data.table without 'with'

北城余情 提交于 2019-12-28 03:10:14

问题


I have a data.table structure like so (except mine is really huge):

dt <- data.table(x=1:5, y=3:7, key='x')

I want to look up rows in that structure by another variable whose name is x (notice - the same as the name of the key of dt):

x <- 3:4
dt2 <- dt[ J(x) ]

This doesn't work, because the lookup sees the column name first, and the local variable is obscured:

dt2
#    x y
# 1: 1 3
# 2: 2 4
# 3: 3 5
# 4: 4 6
# 5: 5 7

I thought about the with argument for [.data.table, but that only applies to the j argument, not the i argument.

Is there something similar for the i argument?

If not, such a thing would be handy whenever I'm using a local variable and I don't know the complete list of column names in dt, to avoid conflicts.


回答1:


There is an item in the NEWS for 1.8.2 that suggests a ..() syntax will be added at some point, allowing this

New DT[.(...)] syntax (in the style of package plyr) is identical to DT[list(...)], DT[J(...)] and DT[data.table(...)]. We plan to add ..(), too, so that .() and ..() are analogous to the file system's ./ and ../; i.e., .() evaluates within the frame of DT and ..() in the parent scope.

In the mean time, you can get from the appropriate environment

dt[J(get('x', envir = parent.frame(3)))]
##    x y
## 1: 3 5
## 2: 4 6

or you could eval the whole call to list(x) or J(x)

dt[eval(list(x))]
dt[eval(J(x))]
dt[eval(.(x))]



回答2:


New answer, now that I think I understand what was requested:

> X <- data.table(x=x)
> merge(dt, X)
   x y
1: 3 6
2: 4 7



回答3:


Setting a key is not required and it's faster:

dt[eval(dt[, x %in% ..x])]

   x y
1: 3 5
2: 4 6

Benchmark with the previously posted answers

microbenchmark(dt[eval(dt[, x %in% ..x])],
               dt[J(get('x', parent.frame(3)))],
               dt[eval(list(x))],
               dt[eval(J(x))],
               dt[eval(.(x))],
               merge(dt, data.table(x)),
               times = 100L)

Unit: microseconds
                                  expr    min      lq     mean  median      uq    max neval
      dt[eval(dt[, x %in% ..x])]  486.1  500.60  518.529  503.70  512.65 1238.0   100
dt[J(get("x", parent.frame(3)))]  837.3  853.25  891.424  860.00  868.30 1675.3   100
               dt[eval(list(x))]  831.8  842.70  929.521  851.95  859.85 3878.3   100
                  dt[eval(J(x))]  833.8  845.50  948.535  856.00  870.00 4599.2   100
                  dt[eval(.(x))]  828.6  846.40  871.054  851.75  859.35 1985.6   100
        merge(dt, data.table(x)) 1766.0 1804.70 1907.617 1819.95 1870.95 3123.1   100



回答4:


Adding some benchmarking results, by request.

dt is a 53080731 x 5 data.table object, keyed by a numeric column with around 100 unique values, fairly evenly distributed. x is a vector containing 5 of those values.

library(microbenchmark)
> mb <- microbenchmark(
+     dt[eval(J(x))],
+     merge(dt, data.table(x)),
+     times=10
+ )
> mb
Unit: milliseconds
                     expr      min       lq    median       uq      max neval
           dt[eval(J(x))]  127.324  127.549  133.5305  154.410  159.433    10
 merge(dt, data.table(x)) 5028.349 5083.792 5129.6590 5170.451 5250.255    10

@Tyler, if you can assist me with how to use qdap::lookup() for this case with multiple columns, I can add that too.



来源:https://stackoverflow.com/questions/15102068/keyed-lookup-on-data-table-without-with

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!