Let\'s say I want to find all words in which letter \"e\" appears exactly two times. When I define this pattern:
pattern1 <- \"e.*e\"
grep(pattern1, stri
We can use a pattern to match zero or more characters that are not 'e' ([^e]*
) from the start (^
) of the string, followed by character 'e', then another set of characters that are not 'e' followed by 'e', and zero or more characters not an 'e' until the end ($
) of the string
res <- grep("^[^e]*e[^e]*e[^e]*$", stringr::words, value = TRUE)
stringr::str_count(res, "e")
#[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#[58] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#[115] 2 2 2 2 2 2 2
If you're okay not using grep
stringr::str_count(words, "e") == 2
If you want more efficiency,
stringi::stri_count_fixed(words, "e") == 2
Both of these return logical vectors, you can get the words with words[..code from above..]
^[^e]*e[^e]e[^e]$
^ asserts :: start of the string
[^e]* :: Match a zero or more character not present in the list
*(asterisk) — Matches between zero and unlimited times, as many times as possible
e :: matches the character e literally (case sensitive)
repeat [^e]* to match all other characters if between 2 e's
$ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)
so, [^e]* matches all characters except e, zero or multiple times. so that if string contain only e then also condition satisfy as it consider zero occurrence of all other characters.
You may use:
^(?:[^e]*e){2}[^e]*$
See the regex demo. The (?:...)
is a non-capturing group that allows quantifying a sequence of subpatterns and is thus easily adjustable to match 3, 4 or more specific sequences in a string.
Details
^
- start of string(?:[^e]*e){2}
- 2 occurrences of
[^e]*
- any 0+ chars other than e
e
- an e
[^e]*
- any 0+ chars other than e
$
- end of stringSee the R demo below:
x <- c("feel", "agre", "degree")
rx <- "^(?:[^e]*e){2}[^e]*$"
grep(rx, x, value = TRUE)
## => [1] "feel"
Note that instead of value = T
it is safer to use value = TRUE
as T
might be redefined in the code above.