Regex; eliminate all punctuation except

谁说我不能喝 提交于 2019-11-26 23:28:54

问题


I have the following regex that splits on any space or punctuation. How can I exclude 1 or more punctuation characters from :punct:? Let's say I'd like to exclude apostrophes and commas. I know I could explicitly use [all punctuation marks in here] instead of [[:punct:]] but I'm hoping for an exclusion method.

X <- "I'm not that good at regex yet, but am getting better!"
strsplit(X, "[[:space:]]|(?=[[:punct:]])", perl=TRUE)

 [1] "I"       "'"       "m"       "not"     "that"    "good"    "at"      "regex"   "yet"    
[10] ","       ""        "but"     "am"      "getting" "better"  "!"

回答1:


It's not clear to me what you want the result to be, but you might be able to use negative classes like this answer.

R> strsplit(X, "[[:space:]]|(?=[^,'[:^punct:]])", perl=TRUE)[[1]]
 [1] "I'm"     "not"     "that"    "good"    "at"      "regex"   "yet,"   
 [8] "but"     "am"      "getting" "better"  "!"    



回答2:


You may impose a restriction to a PCRE subpattern directly with a (?![',]) negative lookahead that fails the match if the next char to the right is ' or ,:

[[:space:]]|(?=(?![',])[[:punct:]])
               ^^^^^^^^ 

See the regex demo.

Details

  • [[:space:]] - any whitespace
  • | - or
  • (?=(?![',])[[:punct:]]) - a positive lookahead that requires that, immediately to the right of the current position, there is no ' and , and that there is any 1 punctuation char that is not a ' or , (effectively, requiring any punctuation symbol other than ' and ,).

See the R online demo

X <- "I'm not that good at regex yet, but am getting better!"
strsplit(X, "[[:space:]]|(?=(?![',])[[:punct:]])", perl=TRUE)
[[1]]
 [1] "I'm"     "not"     "that"    "good"    "at"      "regex"   "yet,"   
 [8] "but"     "am"      "getting" "better"  "!"


来源:https://stackoverflow.com/questions/13372438/regex-eliminate-all-punctuation-except

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!