How to iterate over Unicode characters in Python 3?

前端未结

关注

 3  919

花落未央 2020-12-10 03:48

I need to step through a Python string one character at a time, but a simple \"for\" loop gives me UTF-16 code units instead:

str = \"abc\\u20ac\\U00010302\\


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   眼角桃花
                                             
                
                
                (楼主)
            
              
              
                2020-12-10 04:27
              

            
            
                        
If you create the string as a unicode object, it should be able to break off a character at a time automatically.  E.g.:

Python 2.6:

s = u"abc\u20ac\U00010302\U0010fffd"   # note u in front!
for c in s:
    print "U+%04x" % ord(c)


I received:

U+0061
U+0062
U+0063
U+20ac
U+10302
U+10fffd


Python 3.2:

s = "abc\u20ac\U00010302\U0010fffd"
for c in s:
    print ("U+%04x" % ord(c))


It worked for me:

U+0061
U+0062
U+0063
U+20ac
U+10302
U+10fffd


Additionally, I found this link which explains that the behavior as working correctly.  If the string came from a file, etc, it will likely need to be decoded first.

Update:

I've found an insightful explanation here.  The internal Unicode representation size is a compile-time option, and if working with "wide" chars outside of the 16 bit plane you'll need to build python yourself to remove the limitation, or use one of the workarounds on this page.  Apparently many Linux distros do this for you already as I encountered above.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复