Applying a function to a backreference within gsub in R

后端 未结 4 1344
被撕碎了的回忆
被撕碎了的回忆 2020-12-03 23:19

I\'m new to R and am stuck with backreferencing that doesn\'t seem to work. In:

gsub(\"\\\\((\\\\d+)\\\\)\", f(\"\\\\1\"), string)

It corre

相关标签:
4条回答
  • 2020-12-03 23:53

    R does not have the option of applying a function directly to a match via gsub. You'll actually have to extract the match, transform the value, then replace the value. This is relativaly easy with the regmatches function. For example

    x<-"(990283)M (31)O (29)M (6360)M"
    
    f<-function(x) {
        v<-as.numeric(substr(x,2,nchar(x)-1))
        paste0(v+5,".1")
    }
    
    m <- gregexpr("\\(\\d+\\)", x)
    regmatches(x, m) <- lapply(regmatches(x, m), f)
    x
    # [1] "990288.1M 36.1O 34.1M 6365.1M"
    

    Of course you can make f do whatever you like just make sure it's vector-friendly. Of course, you could wrap this in your own function

    gsubf <- function(pattern, x, f) {
        m <- gregexpr(pattern, x)
        regmatches(x, m) <- lapply(regmatches(x, m), f)
        x   
    }
    gsubf("\\(\\d+\\)", x, f)
    

    Note that in these examples we're not using a capture group, we're just grabbing the entire match. There are ways to extract the capture groups but they are a bit messier. If you wanted to provide an example where such an extraction is required, I might be able to come up with something fancier.

    0 讨论(0)
  • 2020-12-03 23:56

    To use a callback within a regex-capable replacement function, you may use either gsubfn or stringr functions.

    When choosing between them, note that stringr is based on ICU regex engine and with gsubfn, you may use either the default TCL (if the R installation has tcltk capability, else it is the default TRE) or PCRE (if you pass the perl=TRUE argument).

    Also, note that gsubfn allows access to all capturing groups in the match object, while str_replace_all will only allow to manipulate the whole match only. Thus, for str_replace_all, the regex should look like (?<=\()\d+(?=\)), where 1+ digits are matched only when they are enclosed with ( and ) excluding them from the match.

    With stringr, you may use str_replace_all:

    library(stringr)  
    string <- "(990283)M (31)O (29)M (6360)M"
    ## Callback function to increment found number:
    f <- function(x) { as.integer(x) + 1 }
    str_replace_all(string, "(?<=\\()\\d+(?=\\))", function(m) f(m))
    ## => [1] "(990284)M (32)O (30)M (6361)M"
    

    With gsubfn, pass perl=TRUE and backref=0 to be able to use lookarounds and just modify the whole match:

    gsubfn("(?<=\\()\\d+(?=\\))", ~ f(m), string, perl=TRUE, backref=0)
    ## => [1] "(990284)M (32)O (30)M (6361)M"
    

    If you have multiple groups in the pattern, remoe backref=0 and enumerate the group value arguments in the callback function declaration:

    gsubfn("(\\()(\\d+)(\\))", function(m,n,o) paste0(m,f(n),o), string, perl=TRUE)
            ^ 1 ^^  2 ^^ 3 ^           ^^^^^^^          ^^^^   
    
    0 讨论(0)
  • 2020-12-04 00:04

    Here's a way by tweaking a bit stringr::str_replace(), in the replace argument, just use a lambda formula as the replace argument, and reference the captured group not by ""\\1" but by ..1, so your gsub("\\((\\d+)\\)", f("\\1"), string) will become str_replace2(string, "\\((\\d+)\\)", ~f(..1)), or just str_replace2(string, "\\((\\d+)\\)", f) in this simple case :

    str_replace2 <- function(string, pattern, replacement, type.convert = TRUE){
      if(inherits(replacement, "formula"))
        replacement <- rlang::as_function(replacement)
      if(is.function(replacement)){
        grps_mat <- stringr::str_match(string, pattern)[,-1, drop = FALSE]
        grps_list <- lapply(seq_len(ncol(grps_mat)), function(i) grps_mat[,i])
        if(type.convert) {
          grps_list <- type.convert(grps_list, as.is = TRUE) 
          replacement <- rlang::exec(replacement, !!! grps_list)
          replacement <- as.character(replacement)
        } else {
          replacement <- rlang::exec(replacement, !!! grps_list)
        }
      }
      stringr::str_replace(string, pattern, replacement)
    }
    
    str_replace2(
      "foo (4)",
      "\\((\\d+)\\)", 
      sqrt)
    #> [1] "foo 2"
    
    str_replace2(
      "foo (4) (5)",
      "\\((\\d+)\\) \\((\\d+)\\)", 
      ~ sprintf("(%s)", ..1 * ..2))
    #> [1] "foo (20)"
    

    Created on 2020-01-24 by the reprex package (v0.3.0)

    0 讨论(0)
  • 2020-12-04 00:10

    This is for multiple different replacements.

    text="foo(200) (300)bar (400)foo (500)bar (600)foo (700)bar"
    
    f=function(x)
    {
      return(as.numeric(x[[1]])+5)
    }
    a=strsplit(text,"\\(\\K\\d+",perl=T)[[1]]
    
    b=f(str_extract_all(text,perl("\\(\\K\\d+")))
    
    paste0(paste0(a[-length(a)],b,collapse=""),a[length(a)])  #final output
    #[1] "foo(205) (305)bar (405)foo (505)bar (605)foo (705)bar"
    
    0 讨论(0)
提交回复
热议问题