Why doesn't finite repetition in lookbehind work in some flavors?

前端 未结 4 1256
臣服心动
臣服心动 2020-12-10 21:11

I want to parse the 2 digits in the middle from a date in dd/mm/yy format but also allowing single digits for day and month.

This is what I came up with

4条回答
  •  情深已故
    2020-12-10 21:48

    In addition to those listed by @polygenelubricants, there are two more exceptions to the "fixed length only" rule. In PCRE (the regex engine for PHP, Apache, et al) and Oniguruma (Ruby 1.9, Textmate), a lookbehind may consist of an alternation in which each alternative may match a different number of characters, as long as the length of each alternative is fixed. For example:

    (?<=\b\d\d/|\b\d/)\d{1,2}(?=/\d{2}\b)
    

    Note that the alternation has to be at the top level of the lookbehind subexpression. You might, like me, be tempted to factor out the common elements, like this:

    (?<=\b(?:\d\d/|\d)/)\d{1,2}(?=/\d{2}\b)
    

    ...but it wouldn't work; at the top level, the subexpression now consists of a single alternative with a non-fixed length.

    The second exception is much more useful: \K, supported by Perl and PCRE. It effectively means "pretend the match really started here." Whatever appears before it in the regex is treated as a positive lookbehind. As with .NET lookbehinds, there are no restrictions; whatever can appear in a normal regex can be used before the \K.

    \b\d{1,2}/\K\d{1,2}(?=/\d{2}\b)
    

    But most of the time, when someone has a problem with lookbehinds, it turns out they shouldn't even be using them. As @insin pointed out, this problem can be solved much more easily by using a capturing group.

    EDIT: Almost forgot JGSoft, the regex flavor used by EditPad Pro and PowerGrep; like .NET, it has completely unrestricted lookbehinds, positive and negative.

提交回复
热议问题