Converting sanitised html back to displayable html

后端 未结 5 971
北荒
北荒 2020-12-05 18:41

I\'m getting html data from a database which has been sanitised.

Basically what I\'m getting is something like this:

<div class=\"someclass\"&         


        
相关标签:
5条回答
  • 2020-12-05 18:46

    The example from CMS, while good, does not take in account that for example "script" things will get parsed in the div and then not returned at all.

    So I wrote the following simple extension to the strings prototype

    if (!String.prototype.unescapeHTML) {
        String.prototype.unescapeHTML = function() {
            return this.replace(/&[#\w]+;/g, function (s) {
                var entityMap = {
                    "&amp;": "&",
                    "&lt;": "<",
                    "&gt;": ">",
                    '&quot;': '"',
                    '&#39;': "'",
                    '&#x2F;': "/"
                };
    
                return entityMap[s];
            });
        };
    }
    

    This will keep "scripts" in the text and not drop them

    Example

    I will make things bad &lt;b&gt;because evil&lt;/b&gt;
    
    &lt;script language="JavaScript"&gt;console.log('EVIL CODE');&lt;/script&gt;
    

    will drop the "script" part with the CMS style way, but with the string unescapeHTML it will keep it

    0 讨论(0)
  • 2020-12-05 18:49

    This could help in a snap:

    String.prototype.deentitize = function() {
        var ret = this.replace(/&gt;/g, '>');
        ret = ret.replace(/&lt;/g, '<');
        ret = ret.replace(/&quot;/g, '"');
        ret = ret.replace(/&apos;/g, "'");
        ret = ret.replace(/&amp;/g, '&');
        return ret;
    };
    
    0 讨论(0)
  • 2020-12-05 18:53

    I'm not sure why you would want to do this with JavaScript, unless it's server-side JS... but in any case, you could just replalce &gt; and &lt; with their equivalents using the string's replace function.

    However, this may lead to problems if you have used those two in some text, say you wrote an HTML tutorial or whatever. This is why in cases like this you may want to instead store the unsanitized HTML in your database, because converting it may be tricky to do correctly.

    0 讨论(0)
  • 2020-12-05 19:02

    You could create an element, assign the encoded HTML to its innerHTML and retrieve the nodeValue from the text node created on the insertion.

    function htmlDecode(input){
      var e = document.createElement('div');
      e.innerHTML = input;
      return e.childNodes[0].nodeValue;
    }
    
    htmlDecode('&lt;div class="someclass"&gt;&lt;blockquote&gt; &lt;p&gt;&quot; ' +
               'something&quot;&nbsp;here.&lt;/p&gt;Q&lt;/blockquote&gt;')
    
    // returns :
    // "<div class="someclass"><blockquote> <p>"something" here.</p>Q</blockquote>"
    

    Note that this method should work with all the HTML Character Entities.

    0 讨论(0)
  • 2020-12-05 19:08

    https://lodash.com/docs/4.17.10#unescape

    _.unescape('fred, barney, &amp; pebbles');
    // => 'fred, barney, & pebbles'
    
    0 讨论(0)
提交回复
热议问题