replace range of numbers with single numbers in a character string

只愿长相守 提交于 2020-05-29 05:52:18

问题


Is there any way to replace range of numbers wih single numbers in a character string? Number can range from n-n, most probably around 1-15, 4-10 ist also possible.

the range could be indicated with a) -

a <- "I would like to buy 1-3 cats"

or with a word b) for example: to, bis, jusqu'à

b <- "I would like to buy 1 jusqu'à 3 cats"

The results should look like

"I would like to buy 1,2,3 cats"

I found this: Replace range of numbers with certain number but could not really use it in R.


回答1:


gsubfn in the gsubfn package is like gsub but instead of replacing the match with a replacement string it allows the user to specify a function (possibly in formula notation as done here). It then passes the matches to the capture groups in the regular expression, i.e. the matches to the parenthesized parts of the regular expression, as separate arguments and replaces the entire match with the output of the function. Thus we match "(\\d+)(-| to | bis | jusqu'à )(\\d+)" which results in three capture groups so 3 arguments to the function. In the function we use seq with the first and third of these. Note that seq can take character arguments and interpret them as numeric so we did not have to convert the arguments to numeric.

Thus we get this one-liner:

library(gsubfn)
s <- c(a, b) # test input strings

gsubfn("(\\d+)(-| to | bis | jusqu'à )(\\d+)", ~ paste(seq(..1, ..3), collapse = ","), s)

giving:

[1] "I would like to buy 1,2,3 cats" "I would like to buy 1,2,3 cats"



回答2:


Not the most efficient, but ...

s <- c("I would like to buy 1-3 cats",
       "I would like to buy 1 jusqu'à 3 cats",
       "foo 22-33",
       "quux 11-3 bar")

gre <- gregexpr("([0-9]+(-| to | bis | jusqu'à )[0-9]+)", s)
gre2 <- gregexpr('[0-9]+', regmatches(s, gre))

regmatches(s, gre) <- lapply(regmatches(regmatches(s, gre), gre2),
                             function(a) paste(do.call(seq, as.list(as.integer(a))), collapse = ","))
s
# [1] "I would like to buy 1,2,3 cats"          "I would like to buy 1,2,3 cats"         
# [3] "foo 22,23,24,25,26,27,28,29,30,31,32,33" "quux 11,10,9,8,7,6,5,4,3 bar"           



回答3:


This is, in fact, a little tricky, unless someone has already written a package that does this (that I'm not aware of).

a <- "I would like to buy 1-3 cats"
pos <- unlist(gregexpr("\\d+\\D+", a))
a_split <- unlist(strsplit(a, ""))
replacement <- paste(seq.int(a_split[pos[1]], a_split[pos[2]]), collapse = ",")
gsub("\\d+\\D+\\d+", replacement, a)
# [1] "I would like to buy 1,2,3 cats"

EDIT: To show that the same solution works for arbitrary non digit characters between two numbers:

b <- "I would like to buy 1 jusqu'à 3 cats"
pos_b <- unlist(gregexpr("\\d+\\D+", b))
b_split <- unlist(strsplit(b, ""))
replacement <- paste(seq.int(b_split[pos_b[1]], b_split[pos_b[2]]), collapse = ",")
gsub("\\d+\\D+\\d+", replacement, b)
# [1] "I would like to buy 1,2,3 cats"

You can add arbitrary requirements for the run of nondigit characters if you'd like. If you need help with that, just share what the limits on the words or symbols that are between the numbers are!



来源:https://stackoverflow.com/questions/49344140/replace-range-of-numbers-with-single-numbers-in-a-character-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!