the PHP manual states the following about the PCRE\'s \"S\" (Extra analysis of pattern) modifier on http://php.net/manual/en/reference.pcre.pattern.modifiers.php
PHP docs quote a small part of the PCRE docs. Here are some more details (emphasis mine) from PCRE 8.36:
If a compiled pattern is going to be used several times, it is worth spending more time analyzing it in order to speed up the time taken for matching. The function
pcre_study()
takes a pointer to a compiled pattern as its first argument. If studying the pattern produces additional information that will help speed up matching,pcre_study()
returns a pointer to apcre_extra
block, in which thestudy_data
field points to the results of the study.
...
Studying a pattern does two things: first, a lower bound for the length of subject string that is needed to match the pattern is computed. This does not mean that there are any strings of that length that match, but it does guarantee that no shorter strings match. The value is used to avoid wasting time by trying to match strings that are shorter than the lower bound. You can find out the value in a calling program via the
pcre_fullinfo()
function.Studying a pattern is also useful for non-anchored patterns that do not have a single fixed starting character. A bitmap of possible starting bytes is created. This speeds up finding a position in the subject at which to start matching. (In 16-bit mode, the bitmap is used for 16-bit values less than 256. In 32-bit mode, the bitmap is used for 32-bit values less than 256.)
Please note that in the later PCRE version (v10.00, also called PCRE2), the lib has undergone a massive refactoring and API redesign. One of the consequences is that studying is always performed in PCRE 10.00 and above. I don't know when PHP will make use of PCRE2, but it will happen sooner or later because PCRE 8.x won't get any new features from now on.
Here's a quote from the PCRE2 release announcment:
Explicit "studying" of compiled patterns has been abolished - it now always happens automatically. JIT compiling is done by calling a new function,
pcre2_jit_compile()
after a successful return frompcre2_compile()
.
As for your second question:
If the "S" modifier is used per-thread only, how does it differs from the PCRE cache of compiled regexps?
There's no cache in PCRE itself, but PHP maintains a cache of regexes to avoid recompiling the same pattern over and over again, for instance in case you use a preg_
function inside a loop.