Using grep to subset rows from a data.table, comparing row content

陌路散爱 提交于 2019-11-28 01:38:31
Nick Kennedy

If you're happy using the stringi package, this is a way that takes advantage of the fact that the stringi functions vectorise both pattern and string:

DT[stri_detect_fixed(num, y), x := num])

Depending on the data, it may be faster than the method posted by Veerenda Gadekar.

DT <- data.table(num=paste0(sample(1000), sample(2001:2010, 1000, TRUE)),
                 y=as.character(sample(2001:2010, 1000, TRUE)))
microbenchmark(
    vg = DT[, x := grep(y, num, value=TRUE, fixed=TRUE), by = .(num, y)],
    nk = DT[stri_detect_fixed(num, y), x := num]
)

#Unit: microseconds
# expr      min       lq     mean   median       uq      max neval
#   vg 6027.674 6176.397 6513.860 6278.689 6370.789 9590.398   100
#   nk  975.260 1007.591 1116.594 1047.334 1110.734 3833.051   100
Veerendra Gadekar

You could do this

DT[, x := grep(y, num, value = TRUE, fixed = TRUE), by = .(num, y)]

#> DT
#        num    y        x
#1: 20031111 2003 20031111
#2:  1112003 2003  1112003
#3:    23423 2003       NA
#4:  2222004 2004  2222004
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!