Using grep to subset rows from a data.table, comparing row content

前端 未结 2 1428
渐次进展
渐次进展 2020-12-06 14:48
DT <- data.table(num=c(\"20031111\",\"1112003\",\"23423\",\"2222004\"),y=c(\"2003\",\"2003\",\"2003\",\"2004\"))

> DT
    num    y
1: 20031111 2003
2:  111200         


        
2条回答
  •  时光取名叫无心
    2020-12-06 15:49

    If you're happy using the stringi package, this is a way that takes advantage of the fact that the stringi functions vectorise both pattern and string:

    DT[stri_detect_fixed(num, y), x := num])
    

    Depending on the data, it may be faster than the method posted by Veerenda Gadekar.

    DT <- data.table(num=paste0(sample(1000), sample(2001:2010, 1000, TRUE)),
                     y=as.character(sample(2001:2010, 1000, TRUE)))
    microbenchmark(
        vg = DT[, x := grep(y, num, value=TRUE, fixed=TRUE), by = .(num, y)],
        nk = DT[stri_detect_fixed(num, y), x := num]
    )
    
    #Unit: microseconds
    # expr      min       lq     mean   median       uq      max neval
    #   vg 6027.674 6176.397 6513.860 6278.689 6370.789 9590.398   100
    #   nk  975.260 1007.591 1116.594 1047.334 1110.734 3833.051   100
    

提交回复
热议问题