Remove strings found in vector 1, from vector 2

匆匆过客 提交于 2019-11-27 16:26:47

Try this,

sample1 <- c(".aaa", ".aarp", ".abb", ".abbott", ".abogado")
sample2 <- c("try1.aarp", "www.tryagain.aaa", "255.255.255.255", "onemoretry.abb.abogado")
paste0("(",paste(sub("\\.", "\\\\.", sample1), collapse="|"),")\\b")
# [1] "(\\.aaa|\\.aarp|\\.abb|\\.abbott|\\.abogado)\\b"
gsub(paste0("(",paste(sub("\\.", "\\\\.", sample1), collapse="|"),")\\b"), "", sample2)
# [1] "try1"            "www.tryagain"    "255.255.255.255" "onemoretry" 

Explanation:

  • sub("\\.", "\\\\.", sample1) escapes all the dots. Since dots are special chars in regex.

  • paste(sub("\\.", "\\\\.", sample1), collapse="|") combines all the elements with | as delimiter.

  • paste0("(",paste(sub("\\.", "\\\\.", sample1), collapse="|"),")\\b") creates a regex like all the elements present inside a capturing group followed by a word boundary. \\b is a much needed one here . So that it would do an exact word match.

We can paste the 'sample1' elements together, use that as the pattern argument in gsub, replace it with ''.

gsub(paste(sample1, collapse='|'), '', sample2)
#[1] "try1"            "www.tryagain"    "255.255.255.255" "onemoretry"  

Or use mgsub

library(qdap)
mgsub(sample1, '', sample2)
#[1] "try1"            "www.tryagain"    "255.255.255.255" "onemoretry"     
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!