Remove all javascript tags and style tags from html with python and the lxml module

后端 未结 4 2214
南笙
南笙 2020-12-23 12:11

I am parsing an html document using the http://lxml.de/ library. So far I have figured out how to strip tags from an html document In lxml, how do I remove a tag but retain

4条回答
  •  夕颜
    夕颜 (楼主)
    2020-12-23 12:40

    You can use bs4 libray also for this purpose.

    soup = BeautifulSoup(html_src, "lxml")
    [x.extract() for x in soup.findAll(['script', 'style'])]
    

提交回复
热议问题