Regex to match if given text is not found and match as little as possible

时间秒杀一切 提交于 2019-12-11 05:13:52

问题


I have text:

<a>
sdfsdf
<b>DDzz</b>
sdfsdf
</a>
<a>
sdfsdf
<b>DDzz</b>
sdfsdf
</a>
<a>
sdfsdf
<b>BBzz</b>
sdfsdf
</a>
<a>
sdfsdf
<b>DDzz</b>
sdfsdf
</a>

I can't parse it as xml. I need to use regex here. Also this is only example.

I want regex that can match every group <a>...</a> that does not contain element b with text that starts with BB.

I came up with this regex: <a>.*?<b>(?!B).*?</b>.*?</a> But it matches last group as:

<a>
sdfsdf
<b>BBzz</b>
sdfsdf
</a>
<a>
sdfsdf
<b>DDzz</b>
sdfsdf
</a>

Which is bad for me.

How to write regex that will only match those 3 group from my given example?:

1.

<a>
sdfsdf
<b>DDzz</b>
sdfsdf
</a>

2.

<a>
sdfsdf
<b>DDzz</b>
sdfsdf
</a>

3.

<a>
sdfsdf
<b>DDzz</b>
sdfsdf
</a>

回答1:


Use a tempered greedy token regex:

<a>(?:(?!<(?:b>BB|/?a>)).)*</a>

Enable the . matches newline option.

Details:

  • <a> - a literal <a> char sequence
  • (?:(?!<(?:b>BB|/?a>)).)* - a tempered greedy token matching any char (.) that is not the starting symbol of a sequence that can be matched with the pattern inside the (?!<(?:b>BB|/?a>)) lookahead (not a <b>BB or </a> or <a> sequence)
  • </a> - a literal </a> char sequence



来源:https://stackoverflow.com/questions/40860847/regex-to-match-if-given-text-is-not-found-and-match-as-little-as-possible

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!