I wanted to remove all the tags in HTML file. For that I used re module of python. For example, consider the line Hello World!.I want to retain
Hello World!
Use a parser, either lxml or BeautifulSoup:
import lxml.html print lxml.html.fromstring(mystring).text_content()
Related questions:
Using regular expressions to parse HTML: why not?
Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms