Using grep to subset rows from a data.table, comparing row content

前端 未结 2 1427
渐次进展
渐次进展 2020-12-06 14:48
DT <- data.table(num=c(\"20031111\",\"1112003\",\"23423\",\"2222004\"),y=c(\"2003\",\"2003\",\"2003\",\"2004\"))

> DT
    num    y
1: 20031111 2003
2:  111200         


        
相关标签:
2条回答
  • 2020-12-06 15:42

    You could do this

    DT[, x := grep(y, num, value = TRUE, fixed = TRUE), by = .(num, y)]
    
    #> DT
    #        num    y        x
    #1: 20031111 2003 20031111
    #2:  1112003 2003  1112003
    #3:    23423 2003       NA
    #4:  2222004 2004  2222004
    
    0 讨论(0)
  • 2020-12-06 15:49

    If you're happy using the stringi package, this is a way that takes advantage of the fact that the stringi functions vectorise both pattern and string:

    DT[stri_detect_fixed(num, y), x := num])
    

    Depending on the data, it may be faster than the method posted by Veerenda Gadekar.

    DT <- data.table(num=paste0(sample(1000), sample(2001:2010, 1000, TRUE)),
                     y=as.character(sample(2001:2010, 1000, TRUE)))
    microbenchmark(
        vg = DT[, x := grep(y, num, value=TRUE, fixed=TRUE), by = .(num, y)],
        nk = DT[stri_detect_fixed(num, y), x := num]
    )
    
    #Unit: microseconds
    # expr      min       lq     mean   median       uq      max neval
    #   vg 6027.674 6176.397 6513.860 6278.689 6370.789 9590.398   100
    #   nk  975.260 1007.591 1116.594 1047.334 1110.734 3833.051   100
    
    0 讨论(0)
提交回复
热议问题