How to extract substring between patterns “_” and “.” in R [duplicate]

ぃ、小莉子 提交于 2019-12-01 08:48:30

To achieve this, you need a regexp that

  • matches an (optional) arbitrary string in front of the _ : .*
  • matches a literal _ : [_]
  • matches everything up to (but not including) the next . and stores it in capturing group no. 1 : ([^.]+)
  • matches a literal . : [.]
  • matches an (optional) arbitrary string after the . : .*

In your call to gsub, you then

  • use the regular expression we built in the previous step
  • replace the whole string with the contents of the first capturing group: \\1 (we need to escape the backslash, hence the double backslash)

Example:

gsub(".*[_]([^.]+)[.].*", "\\1", "MA0051_IRF2.xml")
gsub(".*_(.*)\\..*", "\\1", txt)
##"IRF2"

an other possibility with the stringr package:

 str_extract(x, perl("(?<=_)(.+)(?=\\.)"))

Here's a possible solution that doesn't require regex knowledge:

txt <- "MA0051_IRF2.xml"

library(qdap)
genXtract(txt, "_", ".")

## _  :  . 
##  "IRF2" 
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!