Extract all [removed] tags in an HTML page and append to the bottom of the document

后端 未结 1 1206
無奈伤痛
無奈伤痛 2020-12-20 04:37

Could someone tell me how I can extract and remove all the

相关标签:
1条回答
  • 2020-12-20 05:22

    The answer is simple and may miss many nuances. How ever, this should give you an idea of how to go about doing it, improving it in general. I am sure this can be improved but you should be able to do that quickly with help of the documentation.

    Reference doc: http://www.crummy.com/software/BeautifulSoup/documentation.html

    from bs4 import BeautifulSoup
    
    doc = ['<html><script type="text/javascript">document.write("Hello World!")',
           '</script><head><title>Page title</title></head>',
           '<body><p id="firstpara" align="center">This is paragraph <b>one</b>.',
           '<p id="secondpara" align="blah">This is paragraph <b>two</b>.',
           '</html>']
    soup = BeautifulSoup(''.join(doc))
    
    
    for tag in soup.findAll('script'):
        # Use extract to remove the tag
        tag.extract()
        # use simple insert
        soup.body.insert(len(soup.body.contents), tag)
    
    print soup.prettify()
    

    Output:

    <html>
     <head>
      <title>
       Page title
      </title>
     </head>
     <body>
      <p id="firstpara" align="center">
       This is paragraph
       <b>
        one
       </b>
       .
      </p>
      <p id="secondpara" align="blah">
       This is paragraph
       <b>
        two
       </b>
       .
      </p>
      <script type="text/javascript">
       document.write("Hello World!")
      </script>
     </body>
    </html>
    
    0 讨论(0)
提交回复
热议问题