Recursive PCRE search with patterns

房东的猫 提交于 2019-12-25 04:46:07

问题


This question has to do with PCRE .

I have seen a recursive search for nested parentheses used with this construct:

\(((?>[^()]+)|(?R))*\)

The problem with this is that, while the '[^()]+' can match any character including newline, you are forced to match only single-character characters, such as braces, brackets, punctuation, single letters, etc.

What I am trying to do is replace the '(' and ')' characters with ANY kind of pattern (keywords such as 'BEGIN' and 'END', for example).

I have come up with the following construct:

(?xs)  (?# <-- 'xs' ignore whitespace in the search term, and allows '.'
               to match newline )
(?P<pattern1>BEGIN)
(
   (?> (?# <-- "once only" search )
      (
         (?! (?P=pattern1) | (?P<pattern2>END)).
      )+
   )
   | (?R)
)*
END

This will actually work on something that looks like this:

BEGIN <<date>>
  <<something>
    BEGIN
      <<something>>
    END <<comment>>
    BEGIN <<time>>
      <<more somethings>>
      BEGIN(cause we can)END
      BEGINEND
    END
  <<something else>>
END

This successfully matches any nested BEGIN..END pairs.

I set up named patterns pattern1 and pattern2 for BEGIN and END, respectively. Using pattern1 in the search term works fine. However, I can't use pattern2 at the end of the search: I have to write out 'END'.

Any idea how I can rewrite this regex so I only have to specify the patterns a single time and use them "everywhere" within the code? In other words, so I don't have to write END both in the middle of the search as well as at the very end.


回答1:


To further extend on @Kobis answer, please see the following regex:

(?xs)
(?(DEFINE)
        (?<pattern1>BEGIN)
        (?<pattern2>END)
)
(?=((?&pattern1)
(?:
   (?> (?# <-- "once only" search )
      (?:
         (?! (?&pattern1) | (?&pattern2)) .
      )+
   )*
   | (?3)
)*
(?&pattern2)
))

This regex will allow you to even fetch the data for each individual data block! Use the 3rd backreference, as the first two have been defined in the define block.

Demo: http://regex101.com/r/bX8mB6




回答2:


This looks like a good use case for a (?(DEFINE)) block, which is used to create such constructs. A Perl example would be:

(?xs)
(?(DEFINE)
        (?<pattern1>BEGIN)
        (?<pattern2>END)
)
(?&pattern1)
(
   (?> (?# <-- "once only" search )
      (
         (?! (?&pattern1) | (?&pattern2)).
      )+
   )
   | (?R)
)*
(?&pattern2)

Example: http://ideone.com/8o9cg

(please note I don't really know any perl, and couldn't get it to work on PHP on any of the online testers)

See also: http://www.pcre.org/pcre.txt (look for (?(DEFINE) 0 it doesn't look like they have pages)


A low-tech solution that works on most flavors is to use lookahead at the start of the pattern:

(?=.*?(?P<pattern1>BEGIN))
(?=.*?(?P<pattern2>END))
...
(?P=pattern1) (?# should work - it was captured )


来源:https://stackoverflow.com/questions/10668675/recursive-pcre-search-with-patterns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!