How to match unicode characters with boost::spirit?

前端未结

关注

 3  983

轻奢々 2021-01-02 05:48

How can I match utf8 unicode characters using boost::spirit?

For example, I want to recognize all characters in this string:

$ echo \"На


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   轮回少年
                                             
                
                
                (楼主)
            
              
              
                2021-01-02 05:51
              

            
            
                        
You can't. The problem is not in boost::spirit but that Unicode is complicated. char doesn't mean a character, it means a 'byte'. And even if you work on the codepoint level, still a user perceived character may be represented by more than one codepoint. (e.g. пусты́нных is 9 characters but 10 codepoints. It may be not clear enough in Russian though because it doesn't use diacritics extensively. other languages do.)

To actually iterate over the user perceived character (or grapheme clusters in Unicode terminology), you'll need to use a Unicode specialized library, namely ICU.

However, what is the real-world use of iterating over the characters?
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复