Positive lookbehind vs match reset (\K) regex feature

霸气de小男生 提交于 2020-01-30 04:27:43

问题


I just learned about the apparently undocumented \K behavior in Ruby regex (thanks to this answer by anubhava). This feature (possibly named Keep?) also exists in PHP, Perl, and Python regex flavors. It is described elsewhere as "drops what was matched so far from the match to be returned."

"abc".match(/ab\Kc/)     # matches "c"

Is this behavior identical to the positive lookbehind marker as used below?

"abc".match(/(?<=ab)c/)  # matches "c"

If not, what differences do the two exhibit?


回答1:


It's easier to see the difference between \K and (?<=...) with the String#scan method.

A lookbehind is a zero-width assertion that doesn't consume characters and that is tested (backwards) from the current position:

> "abcdefg".scan(/(?<=.)./)
=> ["b", "c", "d", "e", "f", "g"]

The "keep" feature \K (that isn't an anchor) defines a position in the pattern where all that was matched so far by the pattern on the left is removed from the match result. But all characters matched before the \K are consumed, they just don't appear in the result:

> "abcdefg".scan(/.\K./)
=> ["b", "d", "f"]

The behaviour is the same as without \K:

> "abcdefg".scan(/../)
=> ["ab", "cd", "ef"]

except that the characters before the \K are removed from the result.

One interesting use of \K is to emulate a variable-length lookbehind, which is not allowed in Ruby (the same for PHP and Perl), or to avoid the creation of a unique capture group. For example (?<=a.*)f. can be implemented using \K:

> "abcdefg".match(/a.*\Kf./)
=> #<MatchData "fg">

An alternative way would be to write /a.*(f.)/, but the \K avoids the need to create a capture group.

Note that the \K feature also exists in the python regex module, even this one allows variable-length lookbehinds.



来源:https://stackoverflow.com/questions/35092563/positive-lookbehind-vs-match-reset-k-regex-feature

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!