I\'m pretty sure I\'m missing something obvious here, but I cannot make R to use non-greedy regular expressions:
> library(stringr)
> str_match(\'xxx a
The problem is matching the shortest window between two strings. @flodel correctly mentions that a regex engine is parsing the string from left to right, and thus all the matches are leftmost. Greediness and laziness only apply to the boundaries on the right: greedy quantifiers get the substrings up to the rightmost boundaries, and the lazy ones will match up to the first occurrence of the subpatterns to follow.
See the examples:
> library(stringr)
> str_extract('xxx aaaab yyy', "a[^ab]*b")
[1] "ab"
> str_extract('xxx aaa xxx aaa zzz', "xxx.*?zzz")
[1] "xxx aaa xxx aaa zzz"
> str_extract('xxx aaa xxx aaa zzz', "xxx(?:(?!xxx|zzz).)*zzz")
[1] "xxx aaa zzz"
The first and the third scenarios return the shortest window, the second one is an illustration of the current problem but with a multicharacter input.
Scenario 1. Boundaries are single characters
In case a and b are single characters, the shortest window is found by using a negated character class. a[^ab]*b will easily grab the substring from a till the next b with no as and bs in between.
Scenario 2. Boundaries are not single characters
You may use a tempered greedy token in these cases that can be further unrolled. The xxx(?:(?!xxx|zzz).)*zzz pattern matches xxx, then any 0+ chars other than a linebreak char that is not the starting char of a xxx or zzz char sequence (the (?!xxx|zzz) is a negative lookahead that fails the match if the substring immediately to the right matches the lookahead pattern), and then a zzz.
These matching scenarios can be easily used with base R regmatches (using a PCRE regex flavor that supports lookaheads):
> x <- 'xxx aaa xxx aaa zzz xxx bbb xxx ccc zzz'
> unlist(regmatches(x, gregexpr("xxx(?:(?!xxx|zzz).)*zzz", x, perl = TRUE)))
[1] "xxx aaa zzz" "xxx ccc zzz"
One note: when using a PCRE regex in base R, or the ICU regex in str_extract/str_match, the . does not match linebreak characters, to enable that behavior, you need to add (?s) at the pattern start (an inline DOTALL modifier).