Strip HTML from strings in Python

前端 未结 26 2780
难免孤独
难免孤独 2020-11-22 02:50
from mechanize import Browser
br = Browser()
br.open(\'http://somewebpage\')
html = br.response().readlines()
for line in html:
  print line

When p

26条回答
  •  我在风中等你
    2020-11-22 03:14

    Here's my solution for python 3.

    import html
    import re
    
    def html_to_txt(html_text):
        ## unescape html
        txt = html.unescape(html_text)
        tags = re.findall("<[^>]+>",txt)
        print("found tags: ")
        print(tags)
        for tag in tags:
            txt=txt.replace(tag,'')
        return txt
    

    Not sure if it is perfect, but solved my use case and seems simple.

提交回复
热议问题