Python ElementTree won't convert non-breaking spaces when using UTF-8 for output

后端未结

关注

 5  1561

执笔经年 2021-02-20 14:45

I\'m trying to parse, manipulate, and output HTML using Python\'s ElementTree:

import sys
from cStringIO  import StringIO
from xml.etree  import ElementTree as E


      
      
        
          5条回答        

        
                    
            
            
                         
                
              
              
                
                   梦谈多话
                                             
                
                
                (楼主)
            
              
              
                2021-02-20 15:33
              

            
            
                        
XML only defines <, >, ', " and &.   and others come from HTML. So you have a couple of choices.


You can change your source to use numeric entities, like   or   both of which are equivalent to  .
You can use a DTD which defines those values.


There is some useful information (it is written about XSLT, but XSLT is written using XML, so the same applies) at the XSLT FAQ.



The question appears now to include a stack trace; that changes things. Are you sure that the string is in UTF-8? If it resolves to the single byte 0xA0, then it isn't UTF-8 but more likely cp1252 or iso-8859-1.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它5个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复