Why does Ruby /[[:punct:]]/ miss some punctuation characters?

前端未结
关注
 2  1201
北恋 2020-12-07 01:11
Ruby /[[:punct:]]/ is supposed to match all \"punctuation characters\". According to Wikipedia, this means /[\\]\\[!\"#$%&\'()*+,./:;<=>?@\\^_`{

      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   失恋的感觉
                                             
                
                
                (楼主)
            
              
              
                2020-12-07 01:27
              

            
            
                        
The greater than symbol is in the "Symbol, Math" category, not the punctuation category. You can see this if you force the regex's encoding to UTF-8 (it defaults to the source encoding, and presumably your source is UTF-8 encoded, while my default source is something else):

2.1.2 :004 > /[[:punct:]]/u =~ '<'
 => nil 
2.1.2 :005 > /[[:punct:]]/ =~ '<'
 => 0 


If you force the regex to ASCII encoding (/n - more options here) you'll see it categorize '<' in punct, which I think is what you want. However, this will probably cause problems if your source contains characters outside the ASCII subset of UTF-8.

2.1.2 :009 > /[[:punct:]]/n =~ '<'
 => 0 


A better solution would be to use the 'Symbol' category instead in your regex instead of the 'punct' one, which matches '<' in UTF-8 encoding:

2.1.2 :012 > /\p{S}/u =~ '<'
 => 0 


There's a longer list of categories here.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复