Remove all HTML tags from a html body except <a>, <br>, <b> and <img>

大城市里の小女人 提交于 2019-12-11 15:09:56

问题


When reading some email HTML body, I often have lots of HTML tags, that I don't want anymore.

How to remove from a string, in Javascript, all HTML tags like:

<anything ...>

or

</anything>

except these few cases <x ...>, </x>, <x ... /> for x being:

  • a
  • br
  • b
  • img

I thought about something like:

s.replace(/<[^a].*>/g, '');

but I'm not sure how to do it.

Example:

<div id="hello">Hello</div><a href="test">Youhou</a>` 

should become

Hello<a href="test">Youhou</a>

Note: I'm looking for a few lines-of-code solution that would work for 90% of the times (the email body comes from my own emails, so I didn't include anything malicious), not for a full solution that would require third-party tool/library.


回答1:


Try replacing

<\/?(?!(a|br|b|img)\b)\w+[^>]*>

with nothing.

<\/? Match the start <, optionally followed by a /

(?!(a|br|b|img)\b) Negative look-ahead ensuring we don't match a, br, b or img tags.

\w+[^>]*> Match the rest of the tag.

Here at regex101.




回答2:


This isn't very beautiful but should meet your requirements

html.replace(/<\/?([^\s>])[^>]*>/gi,function(tag,tagName){
    return ['a','b','br','img'].indexOf(tagName.toLowerCase()) >= 0? tag: '';
})

\/? optional slash ([^\s>]) match tagname [^>]* attributs spaces ect




回答3:


You can pass a function as a second parameter to .replace, that will decide what to do with the output.

str.replace(/<[^a].*>/g, function (s) { /* do something with s */ });

See MDN documentation on replace:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace



来源:https://stackoverflow.com/questions/46466814/remove-all-html-tags-from-a-html-body-except-a-br-b-and-img

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!