How to write a recursive regex that matches nested parentheses?

后端 未结 3 980
别那么骄傲
别那么骄傲 2020-11-28 12:29

I am trying to write a regexp which matches nested parentheses, e.g.:

\"(((text(text))))(text()()text)(casual(characters(#$%^^&&#^%#@!&**&#^*         


        
3条回答
  •  夕颜
    夕颜 (楼主)
    2020-11-28 13:00

    This pattern works:

    $pattern = '~ \( (?: [^()]+ | (?R) )*+ \) ~x';
    

    The content inside parenthesis is simply describe:

    "all that is not parenthesis OR recursion (= other parenthesis)" x 0 or more times

    If you want to catch all substrings inside parenthesis, you must put this pattern inside a lookahead to obtain all overlapping results:

    $pattern = '~(?= ( \( (?: [^()]+ | (?1) )*+ \) ) )~x';
    preg_match_all($pattern, $subject, $matches);
    print_r($matches[1]);
    

    Note that I have added a capturing group and I have replaced (?R) by (?1):

    (?R) -> refers to the whole pattern (You can write (?0) too)
    (?1) -> refers to the first capturing group
    

    What is this lookahead trick?

    A subpattern inside a lookahead (or a lookbehind) doesn't match anything, it's only an assertion (a test). Thus, it allows to check the same substring several times.

    If you display the whole pattern results (print_r($matches[0]);), you will see that all results are empty strings. The only way to obtain the substrings found by the subpattern inside the lookahead is to enclose the subpattern in a capturing group.

    Note: the recursive subpattern can be improved like this:

    \( [^()]*+ (?: (?R) [^()]* )*+ \)
    

提交回复
热议问题