DOMDocument::loadXML vs. HTML Entities

扶醉桌前 提交于 2019-12-08 21:31:30

问题


I currently have a problem reading in XHTML as the XML parser doesn't recognise HTML character entities so:

<?php
$text = <<<EOF
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>Entities are Causing Me Problems</title>
  </head>
  <body>
    <p>Copyright &copy; 2010 Some Bloke</p>
  </body>
</html>
EOF;

$imp = new DOMImplementation ();
$html5 = $imp->createDocumentType ('html', '', '');
$doc = $imp->createDocument ('http://www.w3.org/1999/xhtml', 'html', $html5);

$doc->loadXML ($text);

header ('Content-Type: application/xhtml+xml; charset: utf-8');
echo $doc->saveXML ();

Results in:

Warning: DOMDocument::loadXML() [domdocument.loadxml]: Entity 'copy' not defined in Entity, line: 8 in testing.php on line 19

How can I fix this while allowing myself to serve pages as XHTML5?


回答1:


XHTML5 does not have a DTD, so you may not use the old-school HTML named entities in it, as there is no document type definition to tell the parser what the named entities are for this language. (Except for the predefined XML entities &lt;, &amp;, &quot; and &gt;... and &apos;, though you generally don't want to use that).

Instead use a numeric character reference (&#169;) or, better, just a plain unencoded © character (in UTF-8; remember to include the <meta> element to signify the character set to non-XML parsers).




回答2:


Try using DOMDocument::loadHTML() instead. It doesn't choke on imperfect markup.




回答3:


Hy try with cdata

$text = <<<EOF
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>Entities are Causing Me Problems</title>
  </head>
  <body>
    <![CDATA[<p>Copyright &copy; 2010 Some Bloke</p>]]>
  </body>
</html>
EOF;



回答4:


You shouldn't use loadXML and saveXML and add at the top of a html document the tag

<?xml.

Instead that use loadHTML and saveHTML and add a

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">




来源:https://stackoverflow.com/questions/2262051/domdocumentloadxml-vs-html-entities

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!