Hello how can I extract a number that is between two dashes from a text?
Here is an exampledataset:
text.var <- c("abd-GEN-eft-na-M-D-BINED-10-XX1","abd-GEN-eft-na-M-D-BINED-2-XX2","abd-GEN-eft-na-M-D-BINED-3-XX1")
id <- c(1,2,3)
data <- data.frame("id"=id,"text"=text.var)
> data
id text
1 1 abd-DEF-eft-na-M-D-BINED-10-XX1
2 2 abd-DEF-eft-na-M-D-BINED-2-XX2
3 3 abd-DEF-eft-na-M-D-BINED-3-XX1
I would like to extract the number between "-"s. My desired outcome would be:
> data
id text number
1 1 abd-DEF-eft-na-M-D-BINED-10-XX1 10
2 2 abd-DEF-eft-na-M-D-BINED-2-XX2 2
3 3 abd-DEF-eft-na-M-D-BINED-3-XX1 3
Can anyone give some hint?
Thanks
You can use the str_extract
function from the "stringr" package:
library(stringr)
str_extract(text.var, "(?<=-)[0-9]+(?=-)")
The (?<= ) and (?= ) are the string look behind and look ahead options.
You can do this with sub
and a regular expression.
text.var <- c("abd-GEN-eft-na-M-D-BINED-10-XX1","abd-GEN-eft-na-M-D-BINED-2-XX2","abd-GEN-eft-na-M-D-BINED-3-XX1")
id <- c(1,2,3)
number = as.numeric(sub(".*-(\\d+)-.*", "\\1", text.var))
data <- data.frame("id"=id,"text"=text.var, number)
data
id text number
1 1 abd-GEN-eft-na-M-D-BINED-10-XX1 10
2 2 abd-GEN-eft-na-M-D-BINED-2-XX2 2
3 3 abd-GEN-eft-na-M-D-BINED-3-XX1 3
A little extra detail
In the regular expression, -\\d+-
picks out a sequence of digits surrounded by dashes. I put parentheses around the \d part to store the digits found to get -(\\d+)-
. .*
before and after -(\\d+)-
match all the rest of the characters. So sub
will replace the entire string with just the digits. That gives strings with the digits. I used as.numeric
to make these into numbers rather than strings.
We can use str_extract
library(stringr)
library(dplyr)
data %>%
mutate(number = as.numeric(str_extract(text, "\\d+(?=-)")))
# id text number
#1 1 abd-GEN-eft-na-M-D-BINED-10-XX1 10
#2 2 abd-GEN-eft-na-M-D-BINED-2-XX2 2
#3 3 abd-GEN-eft-na-M-D-BINED-3-XX1 3
来源:https://stackoverflow.com/questions/58015866/extract-a-number-from-a-string-in-r