I am retreiving some html strings from my database and I would like to parse these strings into my DOMDocument. The problem is, that the DOMDocument gives warnings at specia
That's a tricky one because it's actually multiple issues in one.
Like Tomalak points out, there is no in XML. So you did the right thing specifying a DOMImplementation, because in XHTML there is . But, for DOM to know that the document is XHTML, you have load and validate against the DTD. The DTD is located at
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
but because there is millions of requests to that page daily, the W3C decided to block access to the page, unless there is a UserAgent sent in the request. To supply a UserAgent you have to create a custom stream context.
In code:
// make sure DOM passes a User Agent when it fetches the DTD
libxml_set_streams_context(
stream_context_create(
array(
'http' => array(
'user_agent' => 'PHP libxml agent',
)
)
)
);
// specify the implementation
$imp = new DOMImplementation;
// create a DTD (here: for XHTML)
$dtd = $imp->createDocumentType(
'html',
'-//W3C//DTD XHTML 1.0 Transitional//EN',
'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'
);
// then create a DOMDocument with the configured DTD
$dom = $imp->createDocument(NULL, "html", $dtd);
$dom->encoding = 'UTF-8';
$dom->validate();
$fragment = $dom->createDocumentFragment();
$fragment->appendXML('
XHTML test
Some text with a entity
'
);
$dom->documentElement->appendChild($fragment);
$dom->formatOutput = TRUE;
echo $dom->saveXml();
This still takes some time to complete (dont ask me why) but in the end, you'll get (reformatted for SO)
XHTML test
Some text with a entity
Also see DOMDocument::validate() problem