I\'m new to R and am stuck with backreferencing that doesn\'t seem to work. In:
gsub(\"\\\\((\\\\d+)\\\\)\", f(\"\\\\1\"), string)
It corre
R does not have the option of applying a function directly to a match via gsub
. You'll actually have to extract the match, transform the value, then replace the value. This is relativaly easy with the regmatches
function. For example
x<-"(990283)M (31)O (29)M (6360)M"
f<-function(x) {
v<-as.numeric(substr(x,2,nchar(x)-1))
paste0(v+5,".1")
}
m <- gregexpr("\\(\\d+\\)", x)
regmatches(x, m) <- lapply(regmatches(x, m), f)
x
# [1] "990288.1M 36.1O 34.1M 6365.1M"
Of course you can make f
do whatever you like just make sure it's vector-friendly. Of course, you could wrap this in your own function
gsubf <- function(pattern, x, f) {
m <- gregexpr(pattern, x)
regmatches(x, m) <- lapply(regmatches(x, m), f)
x
}
gsubf("\\(\\d+\\)", x, f)
Note that in these examples we're not using a capture group, we're just grabbing the entire match. There are ways to extract the capture groups but they are a bit messier. If you wanted to provide an example where such an extraction is required, I might be able to come up with something fancier.
To use a callback within a regex-capable replacement function, you may use either gsubfn
or stringr
functions.
When choosing between them, note that stringr
is based on ICU regex engine and with gsubfn
, you may use either the default TCL (if the R installation has tcltk
capability, else it is the default TRE) or PCRE (if you pass the perl=TRUE
argument).
Also, note that gsubfn
allows access to all capturing groups in the match object, while str_replace_all
will only allow to manipulate the whole match only. Thus, for str_replace_all
, the regex should look like (?<=\()\d+(?=\))
, where 1+ digits are matched only when they are enclosed with (
and )
excluding them from the match.
With stringr
, you may use str_replace_all
:
library(stringr)
string <- "(990283)M (31)O (29)M (6360)M"
## Callback function to increment found number:
f <- function(x) { as.integer(x) + 1 }
str_replace_all(string, "(?<=\\()\\d+(?=\\))", function(m) f(m))
## => [1] "(990284)M (32)O (30)M (6361)M"
With gsubfn
, pass perl=TRUE
and backref=0
to be able to use lookarounds and just modify the whole match:
gsubfn("(?<=\\()\\d+(?=\\))", ~ f(m), string, perl=TRUE, backref=0)
## => [1] "(990284)M (32)O (30)M (6361)M"
If you have multiple groups in the pattern, remoe backref=0
and enumerate the group value arguments in the callback function declaration:
gsubfn("(\\()(\\d+)(\\))", function(m,n,o) paste0(m,f(n),o), string, perl=TRUE)
^ 1 ^^ 2 ^^ 3 ^ ^^^^^^^ ^^^^
Here's a way by tweaking a bit stringr::str_replace()
, in the replace argument, just use a lambda formula as the replace argument, and reference the captured group not by ""\\1"
but by ..1
, so your gsub("\\((\\d+)\\)", f("\\1"), string)
will become str_replace2(string, "\\((\\d+)\\)", ~f(..1))
, or just str_replace2(string, "\\((\\d+)\\)", f)
in this simple case :
str_replace2 <- function(string, pattern, replacement, type.convert = TRUE){
if(inherits(replacement, "formula"))
replacement <- rlang::as_function(replacement)
if(is.function(replacement)){
grps_mat <- stringr::str_match(string, pattern)[,-1, drop = FALSE]
grps_list <- lapply(seq_len(ncol(grps_mat)), function(i) grps_mat[,i])
if(type.convert) {
grps_list <- type.convert(grps_list, as.is = TRUE)
replacement <- rlang::exec(replacement, !!! grps_list)
replacement <- as.character(replacement)
} else {
replacement <- rlang::exec(replacement, !!! grps_list)
}
}
stringr::str_replace(string, pattern, replacement)
}
str_replace2(
"foo (4)",
"\\((\\d+)\\)",
sqrt)
#> [1] "foo 2"
str_replace2(
"foo (4) (5)",
"\\((\\d+)\\) \\((\\d+)\\)",
~ sprintf("(%s)", ..1 * ..2))
#> [1] "foo (20)"
Created on 2020-01-24 by the reprex package (v0.3.0)
This is for multiple different replacements.
text="foo(200) (300)bar (400)foo (500)bar (600)foo (700)bar"
f=function(x)
{
return(as.numeric(x[[1]])+5)
}
a=strsplit(text,"\\(\\K\\d+",perl=T)[[1]]
b=f(str_extract_all(text,perl("\\(\\K\\d+")))
paste0(paste0(a[-length(a)],b,collapse=""),a[length(a)]) #final output
#[1] "foo(205) (305)bar (405)foo (505)bar (605)foo (705)bar"