DOMDocument removes HTML tags in JavaScript string

心不动则不痛 提交于 2019-12-10 23:36:14

问题


I'm developing PHP applications for quite a while now. But this one realy gets me struggled. I’m loading complete HTML pages using the DomDocument. These pages are external and may contain JavaScript. This is beyond my control.

On some pages things were not rendered the way it supposed to when it came down to basic HTML formatting in JavaScript strings. I've wrote down an example which explains it all.

<?php
$html = new DOMDocument();

libxml_use_internal_errors(true);

$strPage = '<html>
<head>
<title>Demo</title>
<script type="text/javascript">
var strJS = "<b>This is bold.</b><br /><br />This should not be bold. Where did my closing tag go to?";
</script>
</head>
<body>
<script type="text/javascript">
document.write(strJS);
</script>
</body>
</html>';

$html->loadHTML($strPage);
echo $html->saveHTML();
exit;
?>

Am I missing something?

Edit: I've changed the demo. Changing the LoadHTML to LoadXML doesn't work anymore now and the output of the demo will pass w3c validation. Also adding the CDATA block to the JavaScript doesn't seem to have any effect.


回答1:


I dont know why (tried to find out), but it works if you load the HTML using loadXML instead of loadHTML

$html = new DOMDocument();

libxml_use_internal_errors(true);

$strPage = "<html><head>";
$strPage .= "<script type=\"text/javascript\">";
$strPage .= "var strJS = \"<b>This is bold.</b><br /><br />This should not be bold. Where did my closing tag go to?\";";
$strPage .= "</script>";
$strPage .= "<body>";
$strPage .= "<script type=\"text/javascript\">";
$strPage .= "document.write(strJS);";
$strPage .= "</script>";
$strPage .= "</body>";
$strPage .= "</head></html>";

$html->loadXML($strPage);

echo $html->saveHTML();

Though the HTML is actually invalid, everything is in the head.



来源:https://stackoverflow.com/questions/24575136/domdocument-removes-html-tags-in-javascript-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!