Applying a function to a backreference within gsub in R

后端 未结 4 1348
被撕碎了的回忆
被撕碎了的回忆 2020-12-03 23:19

I\'m new to R and am stuck with backreferencing that doesn\'t seem to work. In:

gsub(\"\\\\((\\\\d+)\\\\)\", f(\"\\\\1\"), string)

It corre

4条回答
  •  渐次进展
    2020-12-03 23:56

    To use a callback within a regex-capable replacement function, you may use either gsubfn or stringr functions.

    When choosing between them, note that stringr is based on ICU regex engine and with gsubfn, you may use either the default TCL (if the R installation has tcltk capability, else it is the default TRE) or PCRE (if you pass the perl=TRUE argument).

    Also, note that gsubfn allows access to all capturing groups in the match object, while str_replace_all will only allow to manipulate the whole match only. Thus, for str_replace_all, the regex should look like (?<=\()\d+(?=\)), where 1+ digits are matched only when they are enclosed with ( and ) excluding them from the match.

    With stringr, you may use str_replace_all:

    library(stringr)  
    string <- "(990283)M (31)O (29)M (6360)M"
    ## Callback function to increment found number:
    f <- function(x) { as.integer(x) + 1 }
    str_replace_all(string, "(?<=\\()\\d+(?=\\))", function(m) f(m))
    ## => [1] "(990284)M (32)O (30)M (6361)M"
    

    With gsubfn, pass perl=TRUE and backref=0 to be able to use lookarounds and just modify the whole match:

    gsubfn("(?<=\\()\\d+(?=\\))", ~ f(m), string, perl=TRUE, backref=0)
    ## => [1] "(990284)M (32)O (30)M (6361)M"
    

    If you have multiple groups in the pattern, remoe backref=0 and enumerate the group value arguments in the callback function declaration:

    gsubfn("(\\()(\\d+)(\\))", function(m,n,o) paste0(m,f(n),o), string, perl=TRUE)
            ^ 1 ^^  2 ^^ 3 ^           ^^^^^^^          ^^^^   
    

提交回复
热议问题