问题
When reading some email HTML body, I often have lots of HTML tags, that I don't want anymore.
How to remove from a string, in Javascript, all HTML tags like:
<anything ...>
or
</anything>
except these few cases <x ...>
, </x>
, <x ... />
for x
being:
a
br
b
img
I thought about something like:
s.replace(/<[^a].*>/g, '');
but I'm not sure how to do it.
Example:
<div id="hello">Hello</div><a href="test">Youhou</a>`
should become
Hello<a href="test">Youhou</a>
Note: I'm looking for a few lines-of-code solution that would work for 90% of the times (the email body comes from my own emails, so I didn't include anything malicious), not for a full solution that would require third-party tool/library.
回答1:
Try replacing
<\/?(?!(a|br|b|img)\b)\w+[^>]*>
with nothing.
<\/?
Match the start <
, optionally followed by a /
(?!(a|br|b|img)\b)
Negative look-ahead ensuring we don't match a
, br
, b
or img
tags.
\w+[^>]*>
Match the rest of the tag.
Here at regex101.
回答2:
This isn't very beautiful but should meet your requirements
html.replace(/<\/?([^\s>])[^>]*>/gi,function(tag,tagName){
return ['a','b','br','img'].indexOf(tagName.toLowerCase()) >= 0? tag: '';
})
\/?
optional slash ([^\s>])
match tagname [^>]*
attributs spaces ect
回答3:
You can pass a function as a second parameter to .replace
, that will decide what to do with the output.
str.replace(/<[^a].*>/g, function (s) { /* do something with s */ });
See MDN documentation on replace:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace
来源:https://stackoverflow.com/questions/46466814/remove-all-html-tags-from-a-html-body-except-a-br-b-and-img