How to match strings that only contain a character set in any order and any numbers?

余生长醉 提交于 2021-01-04 10:39:46

问题


I can't seem to figure out how to match strings that contain only "%", and "§", and "#" in any order and repeated any number of times:

str <- c("%#", "#%%§§#", "§%5x#yz", "%#§", "ab§", "!#%§")

This pattern seems to get me close to the solution:

grepl("(?=[§#]*%)(?=[§%]*#)(?=[%#]*§)", str, perl = T)
[1] FALSE  TRUE FALSE  TRUE FALSE  TRUE

Just the last match !#%§ isn't correct as the string does not only contain the character set. I see why grepl matches this string: because the last three characters are indeed the character set. So the remaining question is how to limit matches to the character set. I've tried using the anchors ^ and $, only to find no matches at all:

grepl("^(?=[§#]*%)(?=[§%]*#)(?=[%#]*§)$", str, perl = T)
[1] FALSE FALSE FALSE FALSE FALSE FALSE

What's the solution here?


回答1:


You may use:

^(?=.*%)(?=.*#)(?=.*§)[%#§]+$

Demo.

The trick is to make sure all the characters in the string are allowed characters. We do so by using ^[%#§]+$ in addition to the Lookaheads.

Breakdown:

  • ^ - Beginning of the string.
  • (?=.*%) - A positive lookahead to ensure that the '%' character exists.
  • (?=.*#) - A positive lookahead to ensure that the '#' character exists.
  • (?=.*§) - A positive lookahead to ensure that the '§' character exists.
  • [%#§]+ - Match one or more characters from the character class.
  • $ - End of string.



回答2:


Another approach: ensure with a lookahead that the string is solely composed of your three characters and then match a string that contains three different characters using capturing groups and backreferences restricted with negative lookaheads:

^(?=[%§#]+$)(.).*(?!\1)(.).*(?!\1|\2).

See regex proof.

R proof:

str <- c("%#", "#%%§§#", "§%5x#yz", "%#§", "ab§", "!#%§")
grepl("^(?=[%§#]+$)(.).*(?!\\1)(.).*(?!\\1|\\2).", str, perl = TRUE)

Results: [1] FALSE TRUE FALSE TRUE FALSE FALSE

Explanation

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    [%§#]+                   any character of: '%', '§', '#' (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    $                        before an optional \n, and the end of
                             the string
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    .                        any character except \n
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    \1                       what was matched by capture \1
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  (                        group and capture to \2:
--------------------------------------------------------------------------------
    .                        any character except \n
--------------------------------------------------------------------------------
  )                        end of \2
--------------------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    \1                       what was matched by capture \1
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    \2                       what was matched by capture \2
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  .                        any character except \n


来源:https://stackoverflow.com/questions/64416474/how-to-match-strings-that-only-contain-a-character-set-in-any-order-and-any-numb

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!