strsplit inconsistent with gregexpr

允我心安 提交于 2019-12-04 00:15:11
Casimir et Hippolyte

The theory of @Aprillion is exact, from R documentation:

The algorithm applied to each input string is

repeat {
    if the string is empty
        break.
    if there is a match
        add the string to the left of the match to the output.
        remove the match and all to the left of it.
    else
        add the string to the output.
        break.
}

In other words, at each iteration ^ will match the begining of a new string (without the precedent items.)

To simply illustrate this behavior:

> x <- "12345"
> strsplit( x , "^." , perl = TRUE )
[[1]]
[1] "" "" "" "" ""

Here, you can see the consequence of this behavior with a lookahead assertion as delimiter (Thanks to @JoshO'Brien for the link.)

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!