I have the following regex that splits on any space or punctuation. How can I exclude 1 or more punctuation characters from :punct:? Let's say I'd like to exclude apostrophes and commas. I know I could explicitly use [all punctuation marks in here] instead of [[:punct:]] but I'm hoping for an exclusion method.
X <- "I'm not that good at regex yet, but am getting better!"
strsplit(X, "[[:space:]]|(?=[[:punct:]])", perl=TRUE)
[1] "I" "'" "m" "not" "that" "good" "at" "regex" "yet"
[10] "," "" "but" "am" "getting" "better" "!"
Joshua Ulrich
It's not clear to me what you want the result to be, but you might be able to use negative classes like this answer.
R> strsplit(X, "[[:space:]]|(?=[^,'[:^punct:]])", perl=TRUE)[[1]]
[1] "I'm" "not" "that" "good" "at" "regex" "yet,"
[8] "but" "am" "getting" "better" "!"
You may impose a restriction to a PCRE subpattern directly with a (?![',]) negative lookahead that fails the match if the next char to the right is ' or ,:
[[:space:]]|(?=(?![',])[[:punct:]])
^^^^^^^^
See the regex demo.
Details
[[:space:]]- any whitespace|- or(?=(?![',])[[:punct:]])- a positive lookahead that requires that, immediately to the right of the current position, there is no'and,and that there is any 1 punctuation char that is not a'or,(effectively, requiring any punctuation symbol other than'and,).
See the R online demo
X <- "I'm not that good at regex yet, but am getting better!"
strsplit(X, "[[:space:]]|(?=(?![',])[[:punct:]])", perl=TRUE)
[[1]]
[1] "I'm" "not" "that" "good" "at" "regex" "yet,"
[8] "but" "am" "getting" "better" "!"
来源:https://stackoverflow.com/questions/13372438/regex-eliminate-all-punctuation-except