What are some use cases and is it deprecated? As I found out at http://groups.google.com/group/envjs/browse_thread/thread/6c22d0f959666009/c389fc11537f2a97 that it\'s \"non-
Just a cleaner answer besides @Esailija and @Greg answers: This function will create another document outside the tree of current document, and clean all scripts, styles and images from the new document:
function insertDocument (myHTML) {
var newHTMLDocument = document.implementation.createHTMLDocument().body;
newHTMLDocument.innerHTML = myHTML;
[].forEach.call(newHTMLDocument.querySelectorAll("script, style, img"), function(el) {el.remove(); });
documentsList.push(newHTMLDocument);
return $(newHTMLDocument.innerHTML);
}
This one is fantastic for making ajax requests and scraping the content will be faster :)
Yes. You can use this to load untrusted third-party content and strip it of dangerous tags and attributes before including it into your own document. There is some great research incorporating this trick, described at http://blog.kotowicz.net/2011/10/sad-state-of-dom-security-or-how-we-all.html.
The technique documented by Esailija above is insufficient, however. You also need to strip out most attributes. An attacker could set an onerror or onmouseover element to malicious JS. The style attribute can be used to include CSS that runs malicious JS. Iframe and other embed tags can also be abused. View source at https://html5sec.org/xssme/xssme2 to see a version of this technique.
I always use this because it doesn't make requests to images, execute scripts or affect styling:
function cleanHTML( html ) {
var root = document.implementation.createHTMLDocument().body;
root.innerHTML = html;
//Manipulate the DOM here
$(root).find("script, style, img").remove(); //jQuery is not relevant, I just didn't want to write exhausting boilerplate code just to make a point
return root.innerHTML;
}
cleanHTML( '<div>hello</div><img src="google"><script>alert("hello");</script><style type="text/css">body {display: none !important;}</style>' );
//returns "<div>hello</div>" with the page unaffected