Regex match exact number of letters

后端 未结 4 952
离开以前
离开以前 2020-12-21 15:51

Let\'s say I want to find all words in which letter \"e\" appears exactly two times. When I define this pattern:

pattern1 <- \"e.*e\" 
grep(pattern1, stri         


        
相关标签:
4条回答
  • 2020-12-21 16:34

    We can use a pattern to match zero or more characters that are not 'e' ([^e]*) from the start (^) of the string, followed by character 'e', then another set of characters that are not 'e' followed by 'e', and zero or more characters not an 'e' until the end ($) of the string

    res <- grep("^[^e]*e[^e]*e[^e]*$", stringr::words, value = TRUE)
    stringr::str_count(res, "e")
    #[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
    #[58] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
    #[115] 2 2 2 2 2 2 2
    
    0 讨论(0)
  • 2020-12-21 16:36

    If you're okay not using grep

    stringr::str_count(words, "e") == 2
    

    If you want more efficiency,

    stringi::stri_count_fixed(words, "e") == 2
    

    Both of these return logical vectors, you can get the words with words[..code from above..]

    0 讨论(0)
  • 2020-12-21 16:41

    ^[^e]*e[^e]e[^e]$

    ^ asserts :: start of the string

    [^e]* :: Match a zero or more character not present in the list

    *(asterisk) — Matches between zero and unlimited times, as many times as possible

    e :: matches the character e literally (case sensitive)

    repeat [^e]* to match all other characters if between 2 e's

    $ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)

    so, [^e]* matches all characters except e, zero or multiple times. so that if string contain only e then also condition satisfy as it consider zero occurrence of all other characters.

    0 讨论(0)
  • 2020-12-21 16:46

    You may use:

    ^(?:[^e]*e){2}[^e]*$
    

    See the regex demo. The (?:...) is a non-capturing group that allows quantifying a sequence of subpatterns and is thus easily adjustable to match 3, 4 or more specific sequences in a string.

    Details

    • ^- start of string
    • (?:[^e]*e){2} - 2 occurrences of
      • [^e]* - any 0+ chars other than e
      • e - an e
    • [^e]* - any 0+ chars other than e
    • $ - end of string

    See the R demo below:

    x <- c("feel", "agre", "degree")
    rx <- "^(?:[^e]*e){2}[^e]*$"
    grep(rx, x, value = TRUE)
    ## => [1] "feel"
    

    Note that instead of value = T it is safer to use value = TRUE as T might be redefined in the code above.

    0 讨论(0)
提交回复
热议问题