Regular expression that uses balancing groups

拥有回忆 提交于 2019-12-01 01:09:45

To capture a whole IF/ENDIF block with balanced IF statements, you can use this regex:

%IF\s+(?<Name>\w+)
(?<Contents>
    (?> #Possessive group, so . will not match IF/ENDIF
        \s|
        (?<IF>%IF)|     #for IF, push
        (?<-IF>%ENDIF)| #for ENDIF, pop
        . # or, anything else, but don't allow
    )+
    (?(IF)(?!)) #fail on extra open IFs
)   #/Contents
%ENDIF

The point here is this: you cannot capture in a single Match more than one of every named group. You will only get one (?<Name>\w+) group, for example, of the last captured value. In my regex, I kept the Name and Contents groups of your simple regex, and limited the balancing inside the Contents group - the regex is still wrapped in IF and ENDIF.

If becomes interesting when your data is more complex. For example:

%IF MY_VAR             
  some text
  %IF OTHER_VAR
    some other text
  %ENDIF
  %IF OTHER_VAR2
    some other text 2
  %ENDIF
%ENDIF                 
%IF OTHER_VAR3         
    some other text 3
%ENDIF                 

Here, you will get two matches, one for MY_VAR, and one for OTHER_VAR3. If you want to capture the two ifs on MY_VAR's content, you have to rerun the regex on its Contents group (you can get around it by using a lookahead if you must - wrap the whole regex in (?=...), but you'll need to put it into a logical structure somehow, using positions and lengths).

Now, I won't explain too much, because it seems you get the basics, but a short note about the contents group - I've uses a possessive group to avoid backtracking. Otherwise, it would be possible for the dot to eventually match whole IFs and break the balance. A lazy match on the group would behave similarly (( )+? instead of (?> )+).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!