Could someone tell me how I can extract and remove all the tags in a HTML document and add them to the end of the document, right before the
The answer is simple and may miss many nuances. How ever, this should give you an idea of how to go about doing it, improving it in general. I am sure this can be improved but you should be able to do that quickly with help of the documentation.
Reference doc: http://www.crummy.com/software/BeautifulSoup/documentation.html
from bs4 import BeautifulSoup
doc = ['Page title ',
'This is paragraph one.',
'
This is paragraph two.',
'']
soup = BeautifulSoup(''.join(doc))
for tag in soup.findAll('script'):
# Use extract to remove the tag
tag.extract()
# use simple insert
soup.body.insert(len(soup.body.contents), tag)
print soup.prettify()
Output:
Page title
This is paragraph
one
.
This is paragraph
two
.