what is the difference between ?:, ?! and ?= in regex?

后端 未结 6 634
孤街浪徒
孤街浪徒 2020-11-29 14:34

I searched for the meaning of these expressions but couldn\'t understand the exact difference between them. This is what they say:

  • ?: Match expres
相关标签:
6条回答
  • 2020-11-29 15:14
    ?:  is for non capturing group
    ?=  is for positive look ahead
    ?!  is for negative look ahead
    ?<= is for positive look behind
    ?<! is for negative look behind
    

    Please check here: http://www.regular-expressions.info/lookaround.html for very good tutorial and examples on lookahead in regular expressions.

    0 讨论(0)
  • 2020-11-29 15:16

    The difference between ?= and ?! is that the former requires the given expression to match and the latter requires it to not match. For example a(?=b) will match the "a" in "ab", but not the "a" in "ac". Whereas a(?!b) will match the "a" in "ac", but not the "a" in "ab".

    The difference between ?: and ?= is that ?= excludes the expression from the entire match while ?: just doesn't create a capturing group. So for example a(?:b) will match the "ab" in "abc", while a(?=b) will only match the "a" in "abc". a(b) would match the "ab" in "abc" and create a capture containing the "b".

    0 讨论(0)
  • 2020-11-29 15:17

    The simplest way to understand assertions is to treat them as the command inserted into a regular expression. When the engine runs to an assertion, it will immediately check the condition described by the assertion. If the result is true, then continue to run the regular expression.

    0 讨论(0)
  • 2020-11-29 15:20

    Try matching foobar against these:

    /foo(?=b)(.*)/
    /foo(?!b)(.*)/
    

    The first regex will match and will return "bar" as first submatch — (?=b) matches the 'b', but does not consume it, leaving it for the following parentheses.

    The second regex will NOT match, because it expects "foo" to be followed by something different from 'b'.

    (?:...) has exactly the same effect as simple (...), but it does not return that portion as a submatch.

    0 讨论(0)
  • 2020-11-29 15:22

    This is the real difference:

    >>> re.match('a(?=b)bc', 'abc')
    <Match...>
    >>> re.match('a(?:b)c', 'abc')
    <Match...>
    
    # note:
    >>> re.match('a(?=b)c', 'abc')
    None
    

    If you dont care the content after "?:" or "?=", "?:" and "?=" are just the same. Both of them are ok to use.

    But if you need those content for further process(not just match the whole thing. In that case you can simply use "a(b)") You have to use "?=" instead. Cause "?:"will just through it away.

    0 讨论(0)
  • 2020-11-29 15:28

    To better understand let's apply the three expressions plus a capturing group and analyse each behaviour.

    • () capturing group - the regex inside the parenthesis must be matched and the match create a capturing group
    • (?:) non capturing group - the regex inside the parenthesis must be matched but doesn't not create the capturing group
    • (?=) positive look ahead - asserts that the regex must be matched
    • (?!) negative look ahead - asserts that it is impossible to match the regex

    Let's apply q(u)i to quit. q matches q and the capturing group u matches u. The match inside the capturing group is taken and a capturing group is created. So the engine continues with i. And i will match i. This last match attempt is successful. qui is matched and a capturing group with u is created.

    Let's apply q(?:u)i to quit. Again, q matches q and the non-capturing group u matches u. The match from the non-capturing group is taken, but the capturing group is not created. So the engine continues with i. And i will match i. This last match attempt is successful. qui is matched

    Let's apply q(?=u)i to quit. The lookahead is positive and is followed by another token. Again, q matches q and u matches u. Again, the match from the lookahead must be discarded, so the engine steps back from i in the string to u. The lookahead was successful, so the engine continues with i. But i cannot match u. So this match attempt fails.

    Let's apply q(?=u)u to quit. The lookahead is positive and is followed by another token. Again, q matches q and u matches u. The match from the lookahead must be discarded, so the engine steps back from u in the string to u. The lookahead was successful, so the engine continues with u. And u will match u. So this match attempt is successful. qu is matched

    Let's apply q(?!i)u to quit. Even in this case lookahead is positive (because i does not match) and is followed by another token. Again, q matches q and i doesn't matches u. The match from the lookahead must be discarded, so the engine steps back from u in the string to u. The lookahead was successful, so the engine continues with u. And u will match u. So this match attempt is successful. qu is matched

    So, in conclusion, the real difference between lookahead and non-capturing groups it is all about if you want just test the existence or test and save the match. Capturing group are expensive so use it judiciously.

    0 讨论(0)
提交回复
热议问题