Using documentFragment to parse HTML without sending HTTP requests

问题

I'd like to parse a string and make DOM tree out of it. I decided to use documentFragment API and I did this so far:

var htmlString ="Some really really complicated html string that only can be parsed by a real browser!";
var fragment = document.createDocumentFragment('div');
var tempDiv = document.createElement('div');
fragment.appendChild(tempDiv);
tempDiv.innerHTML = htmlString;
console.log(tempDiv);

But the problem is that this script causes my browser (Chrome specifically) to send actual HTTP requests! what do I mean? take this as example:

var htmlString ='<img src="somewhere/odd/on/the/internet" alt="alt?" />';
var fragment = document.createDocumentFragment('div');
var tempDiv = document.createElement('div');
fragment.appendChild(tempDiv);
tempDiv.innerHTML = htmlString;
console.log(tempDiv);

Which leads to:

Is there any workarounds for this? or any other better idea to parse HTML-String?

回答1:

Well you are appending the element to the page, of course the browser is going to fetch the content.

You can look into using DOMParser

var htmlString ='<img src="somewhere/odd/on/the/internet" alt="alt?" />';
var parser = new DOMParser();
var doc = parser.parseFromString(htmlString , "text/html");

There is code there on the MDN Doc page to support browsers that do not native support for it.

回答2:

I've found answer of my question here on stackoverflow, this answer. the answer consists of a piece of code which parses HTML using native browser functionality but in a semi-sandboxed environment which doesn't send HTTP requests. hope it helps others as well.

回答3:

I took a modified approach to the accepted answer's linked answer, as I don't like the idea of creating an iframe, processing the string through a BUNCH of regular expressions, and then putting that into the DOM.

I needed to preprocess some HTML coming in from an ajax request (this particular HTML has images with relative paths, and the page making the ajax request is not in the same directory as the HTML) and make the path to resources an absolute path instead.

My code looks something like this:

var dataSrcStr = data.replace(/src=/g,'data-src=');
var myContainer = document.getElementById('mycontainer');
myContainer.innerHTML = dataSrcStr;
var imgs = myContainer.querySelectorAll('img');
for(i=0,ii=imgs.length;i<ii;i++){
  imgs[i].src = 'prepended/path/to/img/'+imgs[i].data-src;
  delete imgs[i]['data-src'];
}

Obviously if there's some clear text with src= in it, you'll be replacing that, but it won't be the case for my content, as I control it as well.

This offers me a quicker solution than the linked answer or using a DOMParser, while still adding elements to the DOM to be able to access the elements programmatically.

回答4:

Try this. Works for complex html too. Anything your browser can display, this can parse.

var htmlString = "...";
var newDoc = document.implementation.createHTMLDocument('newDoc');      
newDoc.documentElement.innerHTML = htmlString;

来源：https://stackoverflow.com/questions/12747350/using-documentfragment-to-parse-html-without-sending-http-requests

标签

javascript

html

dom

html-parsing