Remove strings found in vector 1, from vector 2

旧巷老猫 提交于 2019-11-26 18:38:42

问题


I have these two vectors:

sample1 <- c(".aaa", ".aarp", ".abb", ".abbott", ".abogado")
sample2 <- c("try1.aarp", "www.tryagain.aaa", "255.255.255.255", "onemoretry.abb.abogado")

I am trying to remove sample1 strings that are found in sample2. The closest I got is by iterating using sapply, which gave me this:

 sapply(sample1, function(i)gsub(i, "", sample2))

     .aaa                     .aarp                    .abb                 .abbott                  .abogado          
[1,] "try1.aarp"              "try1"                   "try1.aarp"          "try1.aarp"              "try1.aarp"       
[2,] "www.tryagain"           "www.tryagain.aaa"       "www.tryagain.aaa"   "www.tryagain.aaa"       "www.tryagain.aaa"
[3,] "255.255.255.255"        "255.255.255.255"        "255.255.255.255"    "255.255.255.255"        "255.255.255.255" 
[4,] "onemoretry.abb.abogado" "onemoretry.abb.abogado" "onemoretry.abogado" "onemoretry.abb.abogado" "onemoretry.abb"  

Of course the expected output should be

[1] "www.tryagain"    "try1"            "onemoretry"      "255.255.255.255"

Thanks for your time.


回答1:


Try this,

sample1 <- c(".aaa", ".aarp", ".abb", ".abbott", ".abogado")
sample2 <- c("try1.aarp", "www.tryagain.aaa", "255.255.255.255", "onemoretry.abb.abogado")
paste0("(",paste(sub("\\.", "\\\\.", sample1), collapse="|"),")\\b")
# [1] "(\\.aaa|\\.aarp|\\.abb|\\.abbott|\\.abogado)\\b"
gsub(paste0("(",paste(sub("\\.", "\\\\.", sample1), collapse="|"),")\\b"), "", sample2)
# [1] "try1"            "www.tryagain"    "255.255.255.255" "onemoretry" 

Explanation:

  • sub("\\.", "\\\\.", sample1) escapes all the dots. Since dots are special chars in regex.

  • paste(sub("\\.", "\\\\.", sample1), collapse="|") combines all the elements with | as delimiter.

  • paste0("(",paste(sub("\\.", "\\\\.", sample1), collapse="|"),")\\b") creates a regex like all the elements present inside a capturing group followed by a word boundary. \\b is a much needed one here . So that it would do an exact word match.




回答2:


We can paste the 'sample1' elements together, use that as the pattern argument in gsub, replace it with ''.

gsub(paste(sample1, collapse='|'), '', sample2)
#[1] "try1"            "www.tryagain"    "255.255.255.255" "onemoretry"  

Or use mgsub

library(qdap)
mgsub(sample1, '', sample2)
#[1] "try1"            "www.tryagain"    "255.255.255.255" "onemoretry"     


来源:https://stackoverflow.com/questions/34872957/remove-strings-found-in-vector-1-from-vector-2

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!