platform specific Unicode semantics in Python 2.7

前端未结

关注

 3  2019

陌清茗 2020-12-20 03:53

Ubuntu 11.10:

$ python
Python 2.7.2+ (default, Oct  4 2011, 20:03:08)
[GCC 4.6.1] on linux2
Type \"help\", \"copyright\", \"credits\" or \"license\" for more


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   南笙
                                             
                
                
                (楼主)
            
              
              
                2020-12-20 04:04
              

            
            
                        
On Ubuntu, you have a "wide" Python build where strings are UTF-32/UCS-4.  Unfortunately, this isn't (yet) available for Windows.


  Windows builds will be narrow for a while based on the fact that there
  have been few requests for wide characters, those requests are mostly
  from hard-core programmers with the ability to buy their own Python
  and Windows itself is strongly biased towards 16-bit characters.


Python 3.3 will have flexible string representation, in which you will not need to care about whether Unicode strings use 16-bit or 32-bit code units.

Until then, you can get the code points from a UTF-16 string with

def code_points(text):
    utf32 = text.encode('UTF-32LE')
    return struct.unpack('<{}I'.format(len(utf32) // 4), utf32)

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复