Unescape HTML entities containing newline in Javascript?

北慕城南 提交于 2019-12-22 09:01:46

问题


If you have a string containing HTML entities and want to unescape it, this solution (or variants thereof) is suggested multiple times:

function htmlDecode(input){
  var e = document.createElement('div');
  e.innerHTML = input;
  return e.childNodes.length === 0 ? "" : e.childNodes[0].nodeValue;
}

htmlDecode("<img src='myimage.jpg'>"); 
// returns "<img src='myimage.jpg'>"

(See, for example, this answer: https://stackoverflow.com/a/1912522/1199564)

This works fine as long as the string does not contain newline and we are not running on Internet Explorer version pre 10 (tested on version 9 and 8).

If the string contains a newline, IE 8 and 9 will replace it with a space character instead of leaving it unchanged (as it is on Chrome, Safari, Firefox and IE 10).

htmlDecode("Hello\nWorld"); 
// returns "Hello World" on IE 8 and 9

Any suggestions for a solution that works with IE before version 10?


回答1:


The most simple, but probably not the most efficient solution is to have htmlDecode() act only on character and entity references:

var s = "foo\n&amp;\nbar";
s = s.replace(/(&[^;]+;)+/g, htmlDecode);

More efficient is using an optimized rewrite of htmlDecode() that is only called once per input, acts only on character and entity references, and reuses the DOM element object:

function htmlDecode (input)
{
  var e = document.createElement("span");

  var result = input.replace(/(&[^;]+;)+/g, function (match) {
    e.innerHTML = match;
    return e.firstChild.nodeValue;
  });

  return result;
}

/* returns "foo\n&\nbar" */
htmlDecode("foo\n&amp;\nbar");

Wladimir Palant has pointed out an XSS issue with this function: The value of some (HTML5) event listener attributes, like onerror, is executed if you assign HTML with elements that have those attributes specified to the innerHTML property. So you should not use this function on arbitrary input containing actual HTML, only on HTML that is already escaped. Otherwise you should adapt the regular expression accordingly, for example use /(&[^;<>]+;)+/ instead to prevent &…; where contains tags from being matched.

For arbitrary HTML, please see his alternative approach, but note that it is not as compatible as this one.



来源:https://stackoverflow.com/questions/12584889/unescape-html-entities-containing-newline-in-javascript

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!