Regex match exact number of letters

后端未结

关注

 4  952

Let\'s say I want to find all words in which letter \"e\" appears exactly two times. When I define this pattern:

pattern1 <- \"e.*e\" 
grep(pattern1, stri


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  鱼传尺愫        
                
              
                            
                2020-12-21 16:34
              
            
            
                                                                       
We can use a pattern to match zero or more characters that are not 'e' ([^e]*) from the start (^) of the string, followed by character 'e', then another set of characters that are not 'e' followed by 'e', and zero or more characters not an 'e' until the end ($) of the string

res <- grep("^[^e]*e[^e]*e[^e]*$", stringr::words, value = TRUE)
stringr::str_count(res, "e")
#[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#[58] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#[115] 2 2 2 2 2 2 2

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  -上瘾入骨i        
                
              
                            
                2020-12-21 16:36
              
            
            
                                                                       
If you're okay not using grep

stringr::str_count(words, "e") == 2


If you want more efficiency,

stringi::stri_count_fixed(words, "e") == 2


Both of these return logical vectors, you can get the words with words[..code from above..]
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  长发绾君心        
                
              
                            
                2020-12-21 16:41
              
            
            
                                                                       
^[^e]*e[^e]e[^e]$

^ asserts :: start of the string 

[^e]*  :: Match a zero or more character not present in the list 

*(asterisk) — Matches between zero and unlimited times, as many times as possible

e  :: matches the character e literally (case sensitive)

repeat [^e]*  to match all other characters if between 2 e's

$ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)

so, [^e]* matches all characters except e, zero or multiple times. so that if string contain only e then also condition satisfy as it consider zero occurrence of all other characters.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  臣服心动        
                
              
                            
                2020-12-21 16:46
              
            
            
                                                                       
You may use:

^(?:[^e]*e){2}[^e]*$


See the regex demo. The (?:...) is a non-capturing group that allows quantifying a sequence of subpatterns and is thus easily adjustable to match 3, 4 or more specific sequences in a string.

Details


^- start of string
(?:[^e]*e){2} - 2 occurrences of 


[^e]* - any 0+ chars other than e 
e - an e

[^e]* - any 0+ chars other than e
$ - end of string


See the R demo below:

x <- c("feel", "agre", "degree")
rx <- "^(?:[^e]*e){2}[^e]*$"
grep(rx, x, value = TRUE)
## => [1] "feel"


Note that instead of value = T it is safer to use value = TRUE as T might be redefined in the code above.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复