and
I need to match and remove all tags using a regular expression in Perl. I have the following:
<\\\\??(?!p).+?>
But this still matche
I came up with this:
<(?!\/?p(?=>|\s.*>))\/?.*?>
x/
< # Match open angle bracket
(?! # Negative lookahead (Not matching and not consuming)
\/? # 0 or 1 /
p # p
(?= # Positive lookahead (Matching and not consuming)
> # > - No attributes
| # or
\s # whitespace
.* # anything up to
> # close angle brackets - with attributes
) # close positive lookahead
) # close negative lookahead
# if we have got this far then we don't match
# a p tag or closing p tag
# with or without attributes
\/? # optional close tag symbol (/)
.*? # and anything up to
> # first closing tag
/
This will now deal with p tags with or without attributes and the closing p tags, but will match pre and similar tags, with or without attributes.
It doesn't strip out attributes, but my source data does not put them in. I may change this later to do this, but this will suffice for now.