问题
My document contains several instance of code blocks looking like:
{% highlight %}
//some code
{% endhighlight %}
In Atom.io, I am trying to write a regex search to capture those.
My first try was:{% highlight .* %}([\S\s]+){% endhighlight %}
The problem is because there are several code blocks in the same document, it also catches the first code block until the last one, all in one match.
I though to exclude the {
character:{% highlight .* %}([^\{]+){% endhighlight %}
But the problem is that some of the code blocks contain valid {
characters (such as function(){ ... }
).
回答1:
Use non greedy matching:
{% highlight .* %}([\S\s]+?){% endhighlight %}
^
回答2:
The problem with Karthik's lazy matching solution is that when you have large substrings between {% highlight %}
and {% end highlight %}
the [\s\S]*?
will be storing more and more text into the backtracking buffer that can eventually overrun.
Using an unrolling-the-loop technique, you can avoid that:
{% highlight %}([^{]*(?:{(?!% endhighlight %})[^{]*)*){% endhighlight %}
See the regex demo
This way, the substrings inside the highlight blocks can be of any length and performance will stay fast.
Main regex parts:
{% highlight %}
- matches the{% highlight %}
text literally([^{]*(?:{(?!% endhighlight %})[^{]*)*)
- matches and captures into group 1 everything that is not{% endhighlight %}
matching:[^{]*
- 0 or more characters other than{
(?:{(?!% endhighlight %})[^{]*)*
- 0 or more sequences of....{(?!% endhighlight %})
- literal{
not followed by% endhighlight %}
[^{]*
- 0 or more characters other than{
{% endhighlight %}
- matches the{% endhighlight %}
text literally
This is basically the same as {% highlight %}([\s\S]*?){% endhighlight %}
, but "unwraped". The linear execution ensures safer and faster user experience.
回答3:
This regex you can get only content in {% highlight %} ... {% endhighlight %}
:
(?<={% highlight %}).*(?={% endhighlight %})
Test: https://regex101.com/r/nX6wV8/1
Sorry for my fail, I hope this can help you
New expression: https://regex101.com/r/qX2cA1/1
来源:https://stackoverflow.com/questions/33805752/regular-expression-search-avoid-nested-results