Regex to match all HTML tags except

and

后端 未结 13 724
抹茶落季
抹茶落季 2020-11-30 06:31

I need to match and remove all tags using a regular expression in Perl. I have the following:

<\\\\??(?!p).+?>

But this still matche

13条回答
  •  無奈伤痛
    2020-11-30 07:15

    In my opinion, trying to parse HTML with anything other than an HTML parser is just asking for a world of pain. HTML is a really complex language (which is one of the major reasons that XHTML was created, which is much simpler than HTML).

    For example, this:

     /
        

    is a complete, 100% well-formed, 100% valid HTML document. (Well, it's missing the DOCTYPE declaration, but other than that ...)

    It is semantically equivalent to

    
      
        
          >
        
      
      
        

    >

    But it's nevertheless valid HTML that you're going to have to deal with. You could, of course, devise a regex to parse it, but, as others already suggested, using an actual HTML parser is just sooo much easier.

提交回复
热议问题