Why urllib returns garbage from some wikipedia articles?

前端未结

关注

 3  1941

刺人心 2021-01-14 00:33

>>> import urllib2

>>> good_article = \'http://en.wikipedia.org/wiki/Wikipedia\'
>>> bad_article = \'http://en.wikipedia.org/wiki/India\'


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   南方客
                                             
                
                
                (楼主)
            
              
              
                2021-01-14 00:57
              

            
            
                        
It's not an environment, locale, or encoding problem. The offending stream of bytes is gzip-compressed. The \x1f\x8B at the start is what you get at the start of a gzip stream with the default settings.

Looks as though the server is ignoring the fact that you didn't do

req2.add_header('Accept-encoding', 'gzip')

You should look at result.headers.getheader('Content-Encoding') and if necessary, decompress it yourself.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复