Escaping unicode strings in python

前端未结

关注

 4  1450

不要未来只要你来 2021-01-03 04:33

In python these three commands print the same emoji:

print \"\\xF0\\x9F\\x8C\\x80\"


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   温柔的废话
                                             
                
                
                (楼主)
            
              
              
                2021-01-03 05:00
              

            
            
                        
Your first string is a byte string. The fact that it prints a single emoji character means that your console is configured to print UTF-8 encoded characters.

Your second string is a Unicode string with a single codepoint, U+1F300. The \U specifies that the next 8 hex digits should be interpreted as a codepoint.

The third string takes advantage of a quirk in the way Unicode strings are stored in Python 2. You've given two UTF-16 entities, which together form the single codepoint U+1F300 the same as the previous string. Each \u takes 4 following hex digits. Individually these characters wouldn't be valid Unicode, but because Python 2 stores its Unicode internally as UTF-16 it works out. In Python 3 this wouldn't be valid.

When you print out a Unicode string, and your console encoding is known to be UTF-8, the Unicode strings are encoded to UTF-8 bytes. Thus the 3 strings end up producing the same byte sequence on the output, generating the same character.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复