I am trying to find a simple way to extract an unknown substring (could be anything) that appear between two known substrings. For example, I have a string:
a<-" anything goes here, STR1 GET_ME STR2, anything goes here"
I need to extract the string GET_ME
which is between STR1 and STR2 (without the white spaces).
I am trying str_extract(a, "STR1 (.+) STR2")
, but I am getting the entire match
[1] "STR1 GET_ME STR2"
I can of course strip the known strings, to isolate the substring I need, but I think there should be a cleaner way to do it by using a correct regular expression.
You may use str_match
with STR1 (.*?) STR2
(note the spaces are "meaningful", if you want to just match anything in between STR1
and STR2
use STR1(.*?)STR2
). If you have multiple occurrences, use str_match_all
.
library(stringr)
a<-" anything goes here, STR1 GET_ME STR2, anything goes here"
res <- str_match(a, "STR1 (.*?) STR2")
res[,2]
[1] "GET_ME"
Another way using base R regexec
(to get the first match):
test = " anything goes here, STR1 GET_ME STR2, anything goes here STR1 GET_ME2 STR2"
pattern="STR1 (.*?) STR2"
result <- regmatches(test,regexec(pattern,test))
result[[1]][2]
[1] "GET_ME"
Here's another way by using base R
a<-" anything goes here, STR1 GET_ME STR2, anything goes here"
gsub(".*STR1 (.+) STR2.*", "\\1", a)
Output:
[1] "GET_ME"
Another option is to use qdapRegex::ex_between
to extract strings between left and right boundaries
qdapRegex::ex_between(a, "STR1", "STR2")[[1]]
#[1] "GET_ME"
It also works with multiple occurrences
a <- "anything STR1 GET_ME STR2, anything goes here, STR1 again get me STR2"
qdapRegex::ex_between(a, "STR1", "STR2")[[1]]
#[1] "GET_ME" "again get me"
Or multiple left and right boundaries
a <- "anything STR1 GET_ME STR2, anything goes here, STR4 again get me STR5"
qdapRegex::ex_between(a, c("STR1", "STR4"), c("STR2", "STR5"))[[1]]
#[1] "GET_ME" "again get me"
First capture is between "STR1" and "STR2" whereas second between "STR4" and "STR5".
来源:https://stackoverflow.com/questions/58383723/how-to-have-a-regex-for-starting-with-pros-and-ending-before-cons