How do I select rows by two criteria in data.table in R

大城市里の小女人 提交于 2019-11-30 04:33:51

Here is a way that only crossed my mind after I asked the question and it works but I do not know how it does in benchmarks. I am not currently at a computer with an installed R. I guess I should use a cloud instance. Anyway, I like the syntax

DT[c("a","b")]

Using the %in% operator seems to give a factor of 2 performance bump. Consider:

library(data.table)
library(rbenchmark)
DT <- data.table(x=sample(letters, 1e6, TRUE), y=rnorm(1e6), v=runif(1e6))
setkey(DT,x)               # set a 1-column key
DT["b"]
f1 <- function() DT[x %in% letters[1:2]]
f2 <- function() DT[x=="a"| x == "b"]

> benchmark(f1(),f2())
  test replications elapsed relative user.self sys.self user.child sys.child
1 f1()          100    8.40 1.000000      7.58     0.81         NA        NA
2 f2()          100   17.11 2.036905     15.54     1.56         NA        NA

> all.equal(f1(), f2())
[1] TRUE

EDIT: Adding Farrel's option

Note, this is on a different computer, but the relative bumps are the same.

f3 <- function() DT[c("a", "b")]

  test replications elapsed  relative user.self sys.self user.child sys.child
1 f1()          100  11.281  7.121843     9.745    1.323          0         0
2 f2()          100  23.106 14.587121    20.824    2.224          0         0
3 f3()          100   1.584  1.000000     1.042    0.541          0         0
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!