问题
I can't seem to figure out how to match strings that contain only "%", and "§", and "#" in any order and repeated any number of times:
str <- c("%#", "#%%§§#", "§%5x#yz", "%#§", "ab§", "!#%§")
This pattern seems to get me close to the solution:
grepl("(?=[§#]*%)(?=[§%]*#)(?=[%#]*§)", str, perl = T)
[1] FALSE TRUE FALSE TRUE FALSE TRUE
Just the last match !#%§ isn't correct as the string does not only contain the character set. I see why grepl matches this string: because the last three characters are indeed the character set. So the remaining question is how to limit matches to the character set. I've tried using the anchors ^ and $, only to find no matches at all:
grepl("^(?=[§#]*%)(?=[§%]*#)(?=[%#]*§)$", str, perl = T)
[1] FALSE FALSE FALSE FALSE FALSE FALSE
What's the solution here?
回答1:
You may use:
^(?=.*%)(?=.*#)(?=.*§)[%#§]+$
Demo.
The trick is to make sure all the characters in the string are allowed characters. We do so by using ^[%#§]+$ in addition to the Lookaheads.
Breakdown:
^- Beginning of the string.(?=.*%)- A positive lookahead to ensure that the '%' character exists.(?=.*#)- A positive lookahead to ensure that the '#' character exists.(?=.*§)- A positive lookahead to ensure that the '§' character exists.[%#§]+- Match one or more characters from the character class.$- End of string.
回答2:
Another approach: ensure with a lookahead that the string is solely composed of your three characters and then match a string that contains three different characters using capturing groups and backreferences restricted with negative lookaheads:
^(?=[%§#]+$)(.).*(?!\1)(.).*(?!\1|\2).
See regex proof.
R proof:
str <- c("%#", "#%%§§#", "§%5x#yz", "%#§", "ab§", "!#%§")
grepl("^(?=[%§#]+$)(.).*(?!\\1)(.).*(?!\\1|\\2).", str, perl = TRUE)
Results: [1] FALSE TRUE FALSE TRUE FALSE FALSE
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[%§#]+ any character of: '%', '§', '#' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\2 what was matched by capture \2
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
. any character except \n
来源:https://stackoverflow.com/questions/64416474/how-to-match-strings-that-only-contain-a-character-set-in-any-order-and-any-numb