Regular expression search avoid nested results

耗尽温柔 提交于 2019-12-10 15:23:04

问题


My document contains several instance of code blocks looking like:

{% highlight %}
//some code
{% endhighlight %}

In Atom.io, I am trying to write a regex search to capture those.

My first try was:
{% highlight .* %}([\S\s]+){% endhighlight %}

The problem is because there are several code blocks in the same document, it also catches the first code block until the last one, all in one match.

I though to exclude the { character:
{% highlight .* %}([^\{]+){% endhighlight %}

But the problem is that some of the code blocks contain valid { characters (such as function(){ ... }).


回答1:


Use non greedy matching:

{% highlight .* %}([\S\s]+?){% endhighlight %}
                          ^



回答2:


The problem with Karthik's lazy matching solution is that when you have large substrings between {% highlight %} and {% end highlight %} the [\s\S]*? will be storing more and more text into the backtracking buffer that can eventually overrun.

Using an unrolling-the-loop technique, you can avoid that:

{% highlight %}([^{]*(?:{(?!% endhighlight %})[^{]*)*){% endhighlight %}

See the regex demo

This way, the substrings inside the highlight blocks can be of any length and performance will stay fast.

Main regex parts:

  • {% highlight %} - matches the {% highlight %} text literally
  • ([^{]*(?:{(?!% endhighlight %})[^{]*)*) - matches and captures into group 1 everything that is not {% endhighlight %} matching:
    • [^{]* - 0 or more characters other than {
    • (?:{(?!% endhighlight %})[^{]*)* - 0 or more sequences of....
      • {(?!% endhighlight %}) - literal { not followed by % endhighlight %}
      • [^{]* - 0 or more characters other than {
  • {% endhighlight %} - matches the {% endhighlight %} text literally

This is basically the same as {% highlight %}([\s\S]*?){% endhighlight %}, but "unwraped". The linear execution ensures safer and faster user experience.




回答3:


This regex you can get only content in {% highlight %} ... {% endhighlight %}:

(?<={% highlight %}).*(?={% endhighlight %})

Test: https://regex101.com/r/nX6wV8/1


Sorry for my fail, I hope this can help you

New expression: https://regex101.com/r/qX2cA1/1



来源:https://stackoverflow.com/questions/33805752/regular-expression-search-avoid-nested-results

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!