Processing a HTML file using Python

前端 未结 5 539
半阙折子戏
半阙折子戏 2021-01-26 08:10

I wanted to remove all the tags in HTML file. For that I used re module of python. For example, consider the line

Hello World!

.I want to retain
5条回答
  •  天命终不由人
    2021-01-26 09:11

    Use a parser, either lxml or BeautifulSoup:

    import lxml.html
    print lxml.html.fromstring(mystring).text_content()
    

    Related questions:

    Using regular expressions to parse HTML: why not?

    Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms

提交回复
热议问题