regex to replace words with more than two consecutive characters

只愿长相守 提交于 2019-12-19 12:46:05

问题


How can I detect the presence of more than two consecutive characters in a word and remove that word?

I seem to be able to do it like this:

# example data
mystring <- c(1, 2, 3, "toot", "tooooot")
# clunky regex
gsub("^[[:alpha:]]$", "", gsub(".*(.)\\1+\\1", "", mystring)) 
[1] "1"    "2"    "3"    "toot" "" 

But I'm sure there is a more efficient way. How can I do it with just one gsub?


回答1:


You can use grepl instead.

mystring <- c(1, 2, 3, "toot", "tooooot", "good", "apple", "banana")
mystring[!grepl("(.)\\1{2,}", mystring)]
## [1] "1"      "2"      "3"      "toot"   "good"   "apple"  "banana"

** Explanation**
\\1 matches first group (in this case (.) ). {2,} specifies that preceding character should be matched atleast 2 times or more. Since we want to match any character repeated 3 times or more - (.) is first occurrence, \\1 needs to be matched 2 times ore more.




回答2:


Combine the expressions like so:

gsub("^[[:alpha:]]*([[:alpha:]])\\1\\1[[:alpha:]]*$", "", mystring)



回答3:


An other possibility :

mystring[grepl("(.{1})\\1{2,}", mystring, perl=T)] <- ""


来源:https://stackoverflow.com/questions/16294253/regex-to-replace-words-with-more-than-two-consecutive-characters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!