I have a very crazy regex that I\'m trying to diagnose. It is also very long, but I have cut it down to just the following script. Run using Strawberry Perl v5.26.2.
I have reduced your problem to this:
my $text = 'M Y H A P P Y T E X T';
my $regex = '(?
Due to presence of /i (case insensitive) modifier and presence of certain character combinations such as "ss" or "st" that can be replaced by a Typographic_ligature causing it to be a variable length (/August/i matches for instance on both AUGUST (6 characters) and august (5 characters, the last one being U+FB06)).
However if we remove /i (case insensitive) modifier then it works because typographic ligatures are not matched.
Solution: Use aa modifiers i.e.:
/(?
Or in your regex:
my $text = 'M Y H A P P Y T E X T';
my $regex = '(?
From perlre:
To forbid ASCII/non-ASCII matches (like "k" with "\N{KELVIN SIGN}"), specify the "a" twice, for example
/aaior/aia. (The first occurrence of "a" restricts the\d, etc., and the second occurrence adds the "/i" restrictions.) But, note that code points outside the ASCII range will use Unicode rules for/imatching, so the modifier doesn't really restrict things to just ASCII; it just forbids the intermixing of ASCII and non-ASCII.
See a closely related discussion here