Regular expression to match closing HTML tags

前端 未结 4 1657
清歌不尽
清歌不尽 2020-12-10 08:33

I\'m working on a small Python script to clean up HTML documents. It works by accepting a list of tags to KEEP and then parsing through the HTML code trashing tags that are

相关标签:
4条回答
  • 2020-12-10 08:49

    You may also consider using the html parser that is built into python (Documentation for Python 2 and Python 3)

    This will help you home in on the specific area of the HTML Document you would like to work on - and use regular expressions on it.

    0 讨论(0)
  • 2020-12-10 08:50
    1. Read:

      • RegEx match open tags except XHTML self-contained tags
      • Can you provide some examples of why it is hard to parse XML and HTML with a regex?
    2. Repent.

    3. Use a real HTML parser, like BeautifulSoup.

    0 讨论(0)
  • 2020-12-10 09:05

    Don't use regex to parse HTML. It will only give you headaches.

    Use an XML parser instead. Try BeautifulSoup or lxml.

    0 讨论(0)
  • 2020-12-10 09:09
    <TAG\b[^>]*>(.*?)</TAG> 
    

    Matches the opening and closing pair of a specific HTML tag.

    <([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1>
    

    Will match the opening and closing pair of any HTML tag.

    See here.

    0 讨论(0)
提交回复
热议问题