Extract all [removed] tags in an HTML page and append to the bottom of the document

后端未结
关注
 1  1208
無奈伤痛 2020-12-20 04:37
Could someone tell me how I can extract and remove all the

      
      
        
          1条回答        

        
                    
            
            
                         
                
              
              
                
                   心在旅途
                                             
                
                
                (楼主)
            
              
              
                2020-12-20 05:22
              

            
            
                        
The answer is simple and may miss many nuances. How ever, this should give you an idea of how to go about doing it, improving it in general. I am sure this can be improved but you should be able to do that quickly with help of the documentation.

Reference doc: http://www.crummy.com/software/BeautifulSoup/documentation.html

from bs4 import BeautifulSoup

doc = ['Page title',
       'This is paragraph one.',
       'This is paragraph two.',
       '']
soup = BeautifulSoup(''.join(doc))


for tag in soup.findAll('script'):
    # Use extract to remove the tag
    tag.extract()
    # use simple insert
    soup.body.insert(len(soup.body.contents), tag)

print soup.prettify()


Output:


 
  
   Page title
  
 
 
  
   This is paragraph
   
    one
   
   .
  
  
   This is paragraph
   
    two
   
   .
  
  
 


    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                    
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复