Regular expression to match closing HTML tags

前端未结

关注

 4  1667

I\'m working on a small Python script to clean up HTML documents. It works by accepting a list of tags to KEEP and then parsing through the HTML code trashing tags that are

相关标签:

4条回答

南笙

2020-12-10 08:49

You may also consider using the html parser that is built into python (Documentation for Python 2 and Python 3)

This will help you home in on the specific area of the HTML Document you would like to work on - and use regular expressions on it.

0 讨论(0)
发布评论:

提交评论
- 加载中...
长情又很酷

2020-12-10 08:50
1. Read:
  - RegEx match open tags except XHTML self-contained tags
  - Can you provide some examples of why it is hard to parse XML and HTML with a regex?
2. Repent.
3. Use a real HTML parser, like BeautifulSoup.
0 讨论(0)
发布评论:

提交评论
- 加载中...
梦谈多话

2020-12-10 09:05

Don't use regex to parse HTML. It will only give you headaches.

Use an XML parser instead. Try BeautifulSoup or lxml.

0 讨论(0)
发布评论:

提交评论
- 加载中...
暖寄归人

2020-12-10 09:09
```
<TAG\b[^>]*>(.*?)</TAG> 
```
Matches the opening and closing pair of a specific HTML tag.
```
<([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1>
```
Will match the opening and closing pair of any HTML tag.

See here.
0 讨论(0)
发布评论:

提交评论
- 加载中...