R: workaround for variable-width lookbehind

送分小仙女□ 提交于 2019-12-14 02:16:41

问题


Given this vector:

ba <- c('baa','aba','abba','abbba','aaba','aabba')'

I want to change the final a of each word to i except baa and aba.

I wrote the following line ...

gsub('(?<=a[ab]b{1,2})a','i',ba,perl=T)

but was told: PCRE pattern compilation error 'lookbehind assertion is not fixed length' at ')a'.

I looked around a little bit and apparently R/Perl can only lookahead for a variable width, not lookbehind. Any workaround to this problem? Thanks!


回答1:


You can use the lookbehind alternative \K instead. This escape sequence resets the starting point of the reported match and any previously consumed characters are no longer included.

Quoted — rexegg

The key difference between \K and a lookbehind is that in PCRE, a lookbehind does not allow you to use quantifiers: the length of what you look for must be fixed. On the other hand, \K can be dropped anywhere in a pattern, so you are free to have any quantifiers you like before \K.

Using it in context:

sub('a[ab]b{1,2}\\Ka', 'i', ba, perl=T)
# [1] "baa"   "aba"   "abbi"  "abbbi" "aabi"  "aabbi"

Avoiding lookarounds:

sub('(a[ab]b{1,2})a', '\\1i', ba)
# [1] "baa"   "aba"   "abbi"  "abbbi" "aabi"  "aabbi"



回答2:


Another solution for the current case only, when the only quantifier used is a limiting quantifier, may be using stringr::str_replace_all / stringr::str_replace:

> library(stringr)
> str_replace_all(ba, '(?<=a[ab]b{1,2})a', 'i')
[1] "baa"   "aba"   "abbi"  "abbbi" "aabi"  "aabbi"

It works because stringr regex functions are based on ICU regex that features a constrained-width lookbehind:

The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators.)

So, you can't really use any kind of patterns inside ICU lookbehinds, but it is good to know you may use at least a limiting quantifier in it when you need to get overlapping texts within a known distance range.



来源:https://stackoverflow.com/questions/29308348/r-workaround-for-variable-width-lookbehind

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!