I am parsing an html document using the http://lxml.de/ library. So far I have figured out how to strip tags from an html document In lxml, how do I remove a tag but retain
You can use bs4 libray also for this purpose.
soup = BeautifulSoup(html_src, "lxml") [x.extract() for x in soup.findAll(['script', 'style'])]