I found out, that for example this line has a very very long execution time:
System.out.println(
\".. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .
When pattern x
is made optional - using ?
or *
quantifiers (or {0,}
) - engine has two paths to approach according to the nature of quantifier being used:
.*
, .?
).*?
)Someone probably is not aware about regular expressions or doesn't care about performance and throws .*
wherever he needs a match somewhere in string and engines are so fast in taking steps back and forth that nothing seems weird or slow unless a pattern can not be found.
Time complexity starts at O(n)
and continues with O(n^2b)
where b
is level of nesting quantifiers. So on failure number of steps an engine takes is HUGE.
To avoid such situations someone needs to consider some guiding principles:
Specifying boundaries. If pattern should stop somewhere before digits do not do .*
. Instead do \D*
.
Use conditions. You can check if pattern / letter x
exists before running a whole match using a lookahead ^(?=[^x]*x)
. This leads to an early failure.
Use possessive quantifiers or atomic groups (if available). These two avoid backtracks. Sometimes you do not need backtracks.
Do not do (.*)+
or similar patterns. Instead reconsider your requirements or at least use atomic groups (?>.*)+
.
Your own Regular Expression isn't an exception. It suffers from much greediness and optional matches and needs a time to be restudied.