I am retreiving some html strings from my database and I would like to parse these strings into my DOMDocument. The problem is, that the DOMDocument gives warnings at specia
Here's another approach, because we did not want possibly slow network requests (or any network requests at all resulting from user input):
loadHTML('');
$html = 'test ';
$fragment = $document->createDocumentFragment();
$html = '
]>
'.$html.' ';
$newdom = new \DOMDocument();
$newdom->loadXML($html, LIBXML_HTML_NOIMPLIED | LIBXML_NOCDATA | LIBXML_NOENT | LIBXML_NONET | LIBXML_NOBLANKS);
foreach ($newdom->documentElement->childNodes as $childnode)
$fragment->appendChild($fragment->ownerDocument->importNode($childnode, TRUE));
$document->getElementsByTagName('body')[0]->appendChild($fragment);
echo $document->saveHTML();
Here we include the relevant part of the DTD, specifically the latin1 entity definitions as an internal DOCTYPE definition. Then the HTML content is wrapped in a document element to be able to process a sequence of child elements. The parsed nodes are then imported and added into the target DOM.
Our actual implementation uses file_get_contents to load the DTD containing all entity definitions from a local file.