发表新帖

发表新帖

Regex to match all HTML tags except
and

后端未结

关注

 13  782

抹茶落季 2020-11-30 06:31

I need to match and remove all tags using a regular expression in Perl. I have the following:

<\\\\??(?!p).+?>

But this still matche

13条回答

無奈伤痛 (楼主)

2020-11-30 07:15
In my opinion, trying to parse HTML with anything other than an HTML parser is just asking for a world of pain. HTML is a really complex language (which is one of the major reasons that XHTML was created, which is much simpler than HTML).

For example, this:
```
 /
    
```
is a complete, 100% well-formed, 100% valid HTML document. (Well, it's missing the DOCTYPE declaration, but other than that ...)

It is semantically equivalent to
```
  
    
      >
    
  
  
    
      >
    
  
```
But it's nevertheless valid HTML that you're going to have to deal with. You could, of course, devise a regex to parse it, but, as others already suggested, using an actual HTML parser is just sooo much easier.
0 讨论(0)

查看其它13个回答
发布评论:

提交评论
- 加载中...

热议问题