This question already has an answer here:
I have many filenames which look like:
txt= "MA0051_IRF2.xml"
I want to extract IRF2 which is between "_" and ".". How do I do this in R?
To achieve this, you need a regexp that
- matches an (optional) arbitrary string in front of the _ :
.* - matches a literal _ :
[_] - matches everything up to (but not including) the next . and stores it in capturing group no. 1 :
([^.]+) - matches a literal . :
[.] - matches an (optional) arbitrary string after the . :
.*
In your call to gsub, you then
- use the regular expression we built in the previous step
- replace the whole string with the contents of the first capturing group:
\\1(we need to escape the backslash, hence the double backslash)
Example:
gsub(".*[_]([^.]+)[.].*", "\\1", "MA0051_IRF2.xml")
gsub(".*_(.*)\\..*", "\\1", txt)
##"IRF2"
an other possibility with the stringr package:
str_extract(x, perl("(?<=_)(.+)(?=\\.)"))
Here's a possible solution that doesn't require regex knowledge:
txt <- "MA0051_IRF2.xml"
library(qdap)
genXtract(txt, "_", ".")
## _ : .
## "IRF2"
来源:https://stackoverflow.com/questions/23518325/how-to-extract-substring-between-patterns-and-in-r