What kind of formal languages can modern regex engines parse?

后端 未结 3 764
野趣味
野趣味 2020-12-01 03:55

Here on SO people sometimes say something like \"you cannot parse X with regular expressions, because X is not a regular language\". From my understanding however, modern re

3条回答
  •  萌比男神i
    2020-12-01 04:35

    I recently wrote a rather long article on this topic: The true power of regular expressions.

    To summarize:

    • Regular expressions with support for recursive subpattern references can match all context-free languages (e.g a^n b^n).
    • Regular expressions with lookaround assertions and subpattern references can match at least some context-sensitive languages (e.g. ww and a^n b^n c^n).
    • If the assertions have unlimited width (as you say), then all context-sensitive grammars can be matched. I don't know any regex flavor though that does not have fixed-width restrictions on lookbehind (and at the same time supports subpattern references).
    • Regular expressions with backreferences are NP-complete, so any other NP problem can be solved using regular expressions (after applying a polynomial-time transformation).

    Some examples:

    • Matching the context-free language {a^n b^n, n>0}:

      /^(a(?1)?b)$/
      # or
      /^ (?: a (?= a* (\1?+ b) ) )+ \1 $/x
      
    • Matching the context-sensitive language {a^n b^n c^n, n>0}:

      /^
          (?=(a(?-1)?b)c)
          a+(b(?-1)?c)
      $/x
      # or
      /^ (?: a (?= a* (\1?+ b) b* (\2?+ c) ) )+ \1 \2 $/x
      

提交回复
热议问题