问题
How can I detect the presence of more than two consecutive characters in a word and remove that word?
I seem to be able to do it like this:
# example data
mystring <- c(1, 2, 3, "toot", "tooooot")
# clunky regex
gsub("^[[:alpha:]]$", "", gsub(".*(.)\\1+\\1", "", mystring))
[1] "1" "2" "3" "toot" ""
But I'm sure there is a more efficient way. How can I do it with just one gsub
?
回答1:
You can use grepl
instead.
mystring <- c(1, 2, 3, "toot", "tooooot", "good", "apple", "banana")
mystring[!grepl("(.)\\1{2,}", mystring)]
## [1] "1" "2" "3" "toot" "good" "apple" "banana"
** Explanation**\\1
matches first group (in this case (.)
). {2,}
specifies that preceding character should be matched atleast 2 times or more. Since we want to match any character repeated 3 times or more - (.)
is first occurrence, \\1
needs to be matched 2 times ore more.
回答2:
Combine the expressions like so:
gsub("^[[:alpha:]]*([[:alpha:]])\\1\\1[[:alpha:]]*$", "", mystring)
回答3:
An other possibility :
mystring[grepl("(.{1})\\1{2,}", mystring, perl=T)] <- ""
来源:https://stackoverflow.com/questions/16294253/regex-to-replace-words-with-more-than-two-consecutive-characters