How to write a recursive regex that matches nested parentheses?

后端 未结 3 979
别那么骄傲
别那么骄傲 2020-11-28 12:29

I am trying to write a regexp which matches nested parentheses, e.g.:

\"(((text(text))))(text()()text)(casual(characters(#$%^^&&#^%#@!&**&#^*         


        
3条回答
  •  感动是毒
    2020-11-28 12:55

    The following code uses my Parser class (it's under CC-BY 3.0), it works on UTF-8 (thanks to my UTF8 class).

    The way it works is by using a recursive function to iterate over the string. It will call itself each time it finds a (. It will also detect missmatched pairs when it reaches the end of the string without finding the corresponding ).

    Also, this code takes a $callback parameter you can use to process each piece it finds. The callback recieves two parameters: 1) the string, and 2) the level (0 = deepest). Whatever the callback returns will be replaced in the contents of the string (this changes are visible at callback of higher level).

    Note: the code does not includes type checks.

    Non-recursive part:

    function ParseParenthesis(/*string*/ $string, /*function*/ $callback)
    {
        //Create a new parser object
        $parser = new Parser($string);
        //Call the recursive part
        $result = ParseParenthesisFragment($parser, $callback);
        if ($result['close'])
        {
            return $result['contents'];
        }
        else
        {
            //UNEXPECTED END OF STRING
            // throw new Exception('UNEXPECTED END OF STRING');
            return false;
        }
    }
    

    Recursive part:

    function ParseParenthesisFragment(/*parser*/ $parser, /*function*/ $callback)
    {
        $contents = '';
        $level = 0;
        while(true)
        {
            $parenthesis = array('(', ')');
            // Jump to the first/next "(" or ")"
            $new = $parser->ConsumeUntil($parenthesis);
            $parser->Flush(); //<- Flush is just an optimization
            // Append what we got so far
            $contents .= $new;
            // Read the "(" or ")"
            $element = $parser->Consume($parenthesis);
            if ($element === '(') //If we found "("
            {
                //OPEN
                $result = ParseParenthesisFragment($parser, $callback);
                if ($result['close'])
                {
                    // It was closed, all ok
                    // Update the level of this iteration
                    $newLevel = $result['level'] + 1;
                    if ($newLevel > $level)
                    {
                        $level = $newLevel;
                    }
                    // Call the callback
                    $new = call_user_func
                    (
                        $callback,
                        $result['contents'],
                        $level
                    );
                    // Append what we got
                    $contents .= $new;
                }
                else
                {
                    //UNEXPECTED END OF STRING
                    // Don't call the callback for missmatched parenthesis
                    // just append and return
                    return array
                    (
                        'close' => false,
                        'contents' => $contents.$result['contents']
                    );
                }
            }
            else if ($element == ')') //If we found a ")"
            {
                //CLOSE
                return array
                (
                    'close' => true,
                    'contents' => $contents,
                    'level' => $level
                );
            }
            else if ($result['status'] === null)
            {
                //END OF STRING
                return array
                (
                    'close' => false,
                    'contents' => $contents
                );
            }
        }
    }
    

提交回复
热议问题