How to iterate over Unicode characters in Python 3?

前端未结

关注

 3  921

花落未央 2020-12-10 03:48

I need to step through a Python string one character at a time, but a simple \"for\" loop gives me UTF-16 code units instead:

str = \"abc\\u20ac\\U00010302\\


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   死守一世寂寞
                                             
                
                
                (楼主)
            
              
              
                2020-12-10 04:22
              

            
            
                        
Python normally stores the unicode values internally as UCS2. The UTF-16 representation of the UTF-32 \U00010302 character is \UD800\UDF02, that's why you got that result.

That said, there are some python builds that use UCS4, but these builds are not compatible with each other.

Take a look here.


  Py_UNICODE
      This type represents the storage type which is used by Python internally as basis for holding Unicode ordinals. Python’s default builds use a 16-bit type for Py_UNICODE and store Unicode values internally as UCS2. It is also possible to build a UCS4 version of Python (most recent Linux distributions come with UCS4 builds of Python). These builds then use a 32-bit type for Py_UNICODE and store Unicode data internally as UCS4. On platforms where wchar_t is available and compatible with the chosen Python Unicode build variant, Py_UNICODE is a typedef alias for wchar_t to enhance native platform compatibility. On all other platforms, Py_UNICODE is a typedef alias for either unsigned short (UCS2) or unsigned long (UCS4).

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复