DT <- data.table(num=c(\"20031111\",\"1112003\",\"23423\",\"2222004\"),y=c(\"2003\",\"2003\",\"2003\",\"2004\"))
> DT
num y
1: 20031111 2003
2: 111200
You could do this
DT[, x := grep(y, num, value = TRUE, fixed = TRUE), by = .(num, y)]
#> DT
# num y x
#1: 20031111 2003 20031111
#2: 1112003 2003 1112003
#3: 23423 2003 NA
#4: 2222004 2004 2222004
If you're happy using the stringi
package, this is a way that takes advantage of the fact that the stringi
functions vectorise both pattern and string:
DT[stri_detect_fixed(num, y), x := num])
Depending on the data, it may be faster than the method posted by Veerenda Gadekar.
DT <- data.table(num=paste0(sample(1000), sample(2001:2010, 1000, TRUE)),
y=as.character(sample(2001:2010, 1000, TRUE)))
microbenchmark(
vg = DT[, x := grep(y, num, value=TRUE, fixed=TRUE), by = .(num, y)],
nk = DT[stri_detect_fixed(num, y), x := num]
)
#Unit: microseconds
# expr min lq mean median uq max neval
# vg 6027.674 6176.397 6513.860 6278.689 6370.789 9590.398 100
# nk 975.260 1007.591 1116.594 1047.334 1110.734 3833.051 100