regex to replace words with more than two consecutive characters

问题

How can I detect the presence of more than two consecutive characters in a word and remove that word?

I seem to be able to do it like this:

# example data
mystring <- c(1, 2, 3, "toot", "tooooot")
# clunky regex
gsub("^[[:alpha:]]$", "", gsub(".*(.)\\1+\\1", "", mystring)) 
[1] "1"    "2"    "3"    "toot" ""

But I'm sure there is a more efficient way. How can I do it with just one gsub?

回答1:

You can use grepl instead.

mystring <- c(1, 2, 3, "toot", "tooooot", "good", "apple", "banana")
mystring[!grepl("(.)\\1{2,}", mystring)]
## [1] "1"      "2"      "3"      "toot"   "good"   "apple"  "banana"

** Explanation**
\\1 matches first group (in this case (.) ). {2,} specifies that preceding character should be matched atleast 2 times or more. Since we want to match any character repeated 3 times or more - (.) is first occurrence, \\1 needs to be matched 2 times ore more.

回答2:

Combine the expressions like so:

gsub("^[[:alpha:]]*([[:alpha:]])\\1\\1[[:alpha:]]*$", "", mystring)

回答3:

An other possibility :

mystring[grepl("(.{1})\\1{2,}", mystring, perl=T)] <- ""

来源：https://stackoverflow.com/questions/16294253/regex-to-replace-words-with-more-than-two-consecutive-characters

标签

regex

string

character

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!