This question already has an answer here:
I have many filenames which look like:
txt= "MA0051_IRF2.xml"
I want to extract IRF2
which is between "_" and ".". How do I do this in R?
To achieve this, you need a regexp that
- matches an (optional) arbitrary string in front of the _ :
.*
- matches a literal _ :
[_]
- matches everything up to (but not including) the next . and stores it in capturing group no. 1 :
([^.]+)
- matches a literal . :
[.]
- matches an (optional) arbitrary string after the . :
.*
In your call to gsub, you then
- use the regular expression we built in the previous step
- replace the whole string with the contents of the first capturing group:
\\1
(we need to escape the backslash, hence the double backslash)
Example:
gsub(".*[_]([^.]+)[.].*", "\\1", "MA0051_IRF2.xml")
gsub(".*_(.*)\\..*", "\\1", txt)
##"IRF2"
an other possibility with the stringr package:
str_extract(x, perl("(?<=_)(.+)(?=\\.)"))
Here's a possible solution that doesn't require regex knowledge:
txt <- "MA0051_IRF2.xml"
library(qdap)
genXtract(txt, "_", ".")
## _ : .
## "IRF2"
来源:https://stackoverflow.com/questions/23518325/how-to-extract-substring-between-patterns-and-in-r