I am getting PREG_JIT_STACKLIMIT_ERROR error in preg_replace_callback()
function when working with a bit longer string. Above 2000 characters it is not woking (
What is PCRE JIT?
Just-in-time compiling is a heavyweight optimization that can greatly speed up pattern matching. However, it comes at the cost of extra processing before the match is performed. Therefore, it is of most benefit when the same pattern is going to be matched many times.
and how does it work basically?
PCRE (and JIT) is a recursive, depth-first engine, so it needs a stack where the local data of the current node is pushed before checking its child nodes... When the compiled JIT code runs, it needs a block of memory to use as a stack. By default, it uses 32K on the machine stack. However, some large or complicated patterns need more than this. The error
PCRE_ERROR_JIT_STACKLIMIT
is given when there is not enough stack.
By first quote you will understand JIT is an optional feature that is on by default in PHP [v7.*] PCRE. So you can easily turn it off: pcre.jit = 0
(it's not recommended though)
However, while receiving error code #6
of preg_*
functions it means possibly JIT hits the stack size limit.
Since capturing groups consume more memory than non-capturing groups (even more memory is intended to be used as per type of quantifier(s) of clusters):
OP_CBRA
(pcre_jit_compile.c:#1138) - (real memory is more than this):case OP_CBRA:
case OP_SCBRA:
bracketlen = 1 + LINK_SIZE + IMM2_SIZE;
break;
OP_BRA
(pcre_jit_compile.c:#1134) - (real
memory is more than this):case OP_BRA:
bracketlen = 1 + LINK_SIZE;
break;
Therefore changing capturing groups to non-capturing groups in your own RegEx makes it to give proper output (which I don't know exactly how much memory is saved by that)
But it seems you need capturing groups and they are necessary. Then you should re-write your RegEx for the sake of performance. Backtracking is almost everything in a RegEx that should be considered.
Solution:
(?(DEFINE)
(?<recurs>
(?! {@|@} ) [^|] [^{@|\\]* ( \\.[^{@|\\]* )* | (?R)
)
)
{@
(?<If> \w+)-
(?<Condition> (%?\w++ (:\w+)*)* )
(?<True> [|] [^{@|]*+ (?&recurs)* )
(?<False> [|] (?&recurs)* )?
\s*@}
Live demo
PHP code (watch backslash escaping):
preg_match_all('/(?(DEFINE)
(?<recurs>
(?! {@|@} ) [^|] [^{@|\\\\]* ( \\\\.[^{@|\\\\]* )* | (?R)
)
)
{@
(?<If> \w+ )-
(?<Condition> (%?\w++ (:\w+)*)* )
(?<True> [|] [^{@|]*+ (?&recurs)* )
(?<False> [|] (?&recurs)* )?
\s*@}/x', $string, $matches);
This is your own RegEx that is optimized in a way to have least backtracking steps. So whatever was supposed to be matched by your own one is matched by this too.
RegEx without following nested if
blocks:
{@
(?<If> \w+)-
(?<Condition> (%?\w++ (:\w+)*)* )
(?<True> [|] [^|\\]* (?: \\.[^|\\]* )* )
(?<False> [|] \X*)?
@}
Live demo
Most of quantifiers are written possessively (avoids backtrack) by appending +
to them.
The problem as you can see is that your pattern is inefficient. The main reasons are:
(a+)+b
that is the best way for a catastrophic backtracking(a|b)+
that may be a good design except for a backtracking regex engine like pcreAs an aside, there are too much useless capture groups that consumes memory for nothing. When you don't need a capture group, don't write it. If you really need to group elements, use a non-capturing group, but don't use non-capturing groups to make a pattern "more readable" (there are other ways to do that like named groups, free-spacing and comments).
If I understand well, you are trying to build a regex for preg_replace_callback
to deal with the control statement of your template system.
Since these control statements can be nested and a regex engine can't match several times the same substring, you have to choose between several strategies:
You can write a recursive pattern to describe a conditional statement that eventually contains other conditional statements.
You can write a pattern that matches only the innermost conditional statements. (In other words it forbids nested conditional statements.)
In the two cases, you need to parse the string several times until there's nothing to replace. (Note that you can also use a recursive function with the first strategy, but it makes things more complicated.)
Let's see the second way:
$pattern = '~
{@ (?<cond> \w+ ) - (?<stat> \w+ (?: % \w+ )* ) (?: : (?<sub> \w+ ) )? \|
# a "THEN" part that doesn\'t have nested conditional statements
(?<then> [^{|@]*+ (?: { (?!@) [^{|@]* | @ (?!}) [^{|@]* )*+ )
# optional "ELSE" part (the content is similar to the "THEN" part)
(?: \| (?<else> \g<then> ) )? (*SKIP) @}~x';
$parsed_view = $string;
$count = 0;
do {
$parsed_view = preg_replace_callback($pattern, function ($m) {
// do what you need here. The different captures can be
// easily accessed with their names: $m['cond'], $m['stat']...
// as defined in the pattern.
return $result;
}, $parsed_view, -1, $count);
} while ($count);
pattern demo
As you can see the problem of nested statements is solved with the do..while
loop and the count
parameter of preg_replace_callback
to see if something is replaced.
This code isn't tested, but I'm sure you can complete it and eventually adapt it to your needs.
As an aside, there's a lot of template engines that already exists (and PHP is already a template engine). You can use them and avoid to create your own syntax. You can also take a look at their codes.