“Variable length lookbehind not implemented” but it isn't variable length

后端 未结 4 631
长发绾君心
长发绾君心 2020-12-14 13:38

I have a very crazy regex that I\'m trying to diagnose. It is also very long, but I have cut it down to just the following script. Run using Strawberry Perl v5.26.2.

4条回答
  •  心在旅途
    2020-12-14 14:22

    I have reduced your problem to this:

    my $text = 'M Y H A P P Y T E X T';
    my $regex = '(?

    Due to presence of /i (case insensitive) modifier and presence of certain character combinations such as "ss" or "st" that can be replaced by a Typographic_ligature causing it to be a variable length (/August/i matches for instance on both AUGUST (6 characters) and august (5 characters, the last one being U+FB06)).

    However if we remove /i (case insensitive) modifier then it works because typographic ligatures are not matched.

    Solution: Use aa modifiers i.e.:

    /(?

    Or in your regex:

    my $text = 'M Y H A P P Y T E X T';
    my $regex = '(?

    From perlre:

    To forbid ASCII/non-ASCII matches (like "k" with "\N{KELVIN SIGN}"), specify the "a" twice, for example /aai or /aia. (The first occurrence of "a" restricts the \d, etc., and the second occurrence adds the "/i" restrictions.) But, note that code points outside the ASCII range will use Unicode rules for /i matching, so the modifier doesn't really restrict things to just ASCII; it just forbids the intermixing of ASCII and non-ASCII.

    See a closely related discussion here

提交回复
热议问题