问题
I have text:
<a>
sdfsdf
<b>DDzz</b>
sdfsdf
</a>
<a>
sdfsdf
<b>DDzz</b>
sdfsdf
</a>
<a>
sdfsdf
<b>BBzz</b>
sdfsdf
</a>
<a>
sdfsdf
<b>DDzz</b>
sdfsdf
</a>
I can't parse it as xml. I need to use regex here. Also this is only example.
I want regex that can match every group <a>...</a> that does not contain element b with text that starts with BB.
I came up with this regex:
<a>.*?<b>(?!B).*?</b>.*?</a>
But it matches last group as:
<a>
sdfsdf
<b>BBzz</b>
sdfsdf
</a>
<a>
sdfsdf
<b>DDzz</b>
sdfsdf
</a>
Which is bad for me.
How to write regex that will only match those 3 group from my given example?:
1.
<a>
sdfsdf
<b>DDzz</b>
sdfsdf
</a>
2.
<a>
sdfsdf
<b>DDzz</b>
sdfsdf
</a>
3.
<a>
sdfsdf
<b>DDzz</b>
sdfsdf
</a>
回答1:
Use a tempered greedy token regex:
<a>(?:(?!<(?:b>BB|/?a>)).)*</a>
Enable the . matches newline option.
Details:
<a>- a literal<a>char sequence(?:(?!<(?:b>BB|/?a>)).)*- a tempered greedy token matching any char (.) that is not the starting symbol of a sequence that can be matched with the pattern inside the(?!<(?:b>BB|/?a>))lookahead (not a<b>BBor</a>or<a>sequence)</a>- a literal</a>char sequence
来源:https://stackoverflow.com/questions/40860847/regex-to-match-if-given-text-is-not-found-and-match-as-little-as-possible