Print unicode string to console OK but fails when redirect to a file. How to fix?

后端未结

关注

 4  823

I have Python 2.7.1 on a Simplified-Chinese version of Windows XP, and I have a program like this(windows_prn_utf8.py):

#!/usr/bin/env python
# -*- coding: u


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  醉酒成梦        
                
              
                            
                2020-12-21 23:40
              
            
            
                                                                       
Use codecs.open(filename,encoding) instead of open(filename) and write file with python.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野的像风        
                
              
                            
                2020-12-21 23:56
              
            
            
                                                                       
Seems like this was solved, but I think a bit more detail will help explain this actual problem.

The 'utf8' in unicode('\xE7\x94\xB5', 'utf8') is telling the interpreter how to decode the 3 bytes you're providing in the other argument in order to represent the character internally as a unicode object:

In [6]: uobj = unicode('\xe7\x94\xb5','utf8')

In [7]: uobj
Out[7]: u'\u7535'


Another example would be creating the same character from its utf-16 representation (which is what python displays by default and shown in the Out[7] line above):

In [8]: uobj = unicode('\x35\x75','utf16')

In [9]: uobj
Out[9]: u'\u7535'


In your example after the object has been created it becomes an argument to print which tries to write it to standard out (console window, redirected to a file, etc). The complication is that print must re-encode that object into a byte stream before writing it. It looks like in your case the encoding it used by default was ACSII which cannot represent that character.

(If a console will try to display the characters, they will be re-decoded and replaced in the window with the corresponding font glyphs--this is why your output and the console both need to be 'speaking' the same encoding.)

From what I've seen cmd.exe in windows is pretty confusing when it comes to character encodings, but what I do on other OSes is explicitly encode the bytes before printing/writing them with the unicode object's encode function. This returns an encoded byte sequence stored in a str object:

In [10]: sobj = uobj.encode('utf8')

In [11]: type(sobj)
Out[11]: str

In [12]: sobj
Out[12]: '\xe7\x94\xb5'

In [13]: print sobj
电


Now that print is given a str instead of a unicode, it doesn't need to encode anything. In my case my terminal was decoding utf8, and its font contained that particular character, so it was displayed properly on my screen (and hopefully right now in your browser).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  星月不相逢        
                
              
                            
                2020-12-21 23:58
              
            
            
                                                                       
Set PYTHONIOENCODING environmental variable.

SET PYTHONIOENCODING=cp936
windows_prn_utf8.py > 1.txt

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一向        
                
              
                            
                2020-12-22 00:01
              
            
            
                                                                       
You can encode it to utf-8 before you write it to file.

f.write("电".encode("utf8"))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复