I have these two vectors:
sample1 <- c(".aaa", ".aarp", ".abb", ".abbott", ".abogado")
sample2 <- c("try1.aarp", "www.tryagain.aaa", "255.255.255.255", "onemoretry.abb.abogado")
I am trying to remove sample1 strings that are found in sample2. The closest I got is by iterating using sapply
, which gave me this:
sapply(sample1, function(i)gsub(i, "", sample2))
.aaa .aarp .abb .abbott .abogado
[1,] "try1.aarp" "try1" "try1.aarp" "try1.aarp" "try1.aarp"
[2,] "www.tryagain" "www.tryagain.aaa" "www.tryagain.aaa" "www.tryagain.aaa" "www.tryagain.aaa"
[3,] "255.255.255.255" "255.255.255.255" "255.255.255.255" "255.255.255.255" "255.255.255.255"
[4,] "onemoretry.abb.abogado" "onemoretry.abb.abogado" "onemoretry.abogado" "onemoretry.abb.abogado" "onemoretry.abb"
Of course the expected output should be
[1] "www.tryagain" "try1" "onemoretry" "255.255.255.255"
Thanks for your time.
Try this,
sample1 <- c(".aaa", ".aarp", ".abb", ".abbott", ".abogado")
sample2 <- c("try1.aarp", "www.tryagain.aaa", "255.255.255.255", "onemoretry.abb.abogado")
paste0("(",paste(sub("\\.", "\\\\.", sample1), collapse="|"),")\\b")
# [1] "(\\.aaa|\\.aarp|\\.abb|\\.abbott|\\.abogado)\\b"
gsub(paste0("(",paste(sub("\\.", "\\\\.", sample1), collapse="|"),")\\b"), "", sample2)
# [1] "try1" "www.tryagain" "255.255.255.255" "onemoretry"
Explanation:
sub("\\.", "\\\\.", sample1)
escapes all the dots. Since dots are special chars in regex.paste(sub("\\.", "\\\\.", sample1), collapse="|")
combines all the elements with|
as delimiter.paste0("(",paste(sub("\\.", "\\\\.", sample1), collapse="|"),")\\b")
creates a regex like all the elements present inside a capturing group followed by a word boundary.\\b
is a much needed one here . So that it would do an exact word match.
We can paste
the 'sample1' elements together, use that as the pattern
argument in gsub
, replace it with ''
.
gsub(paste(sample1, collapse='|'), '', sample2)
#[1] "try1" "www.tryagain" "255.255.255.255" "onemoretry"
Or use mgsub
library(qdap)
mgsub(sample1, '', sample2)
#[1] "try1" "www.tryagain" "255.255.255.255" "onemoretry"
来源:https://stackoverflow.com/questions/34872957/remove-strings-found-in-vector-1-from-vector-2