PHP DOMDocument->loadXML with XML containing ampersand/less/greater?

僤鯓⒐⒋嵵緔 提交于 2019-12-08 14:19:30

If you have a < inside text in an XML... it's not a valid XML. Try to encode it or to enclose them into <![CDATA[.

If it's not possible (because you're not outputting this "XML") I'd suggest to try with some Html parsing library (I didn't used them, but they exists) beacuse they're less strict than XML ones.

But I'd really try to get valid XML before trying any other thing!!

I often use @ in front of calls to load() for DomDocument mainly because you can never be absolutely sure what you load, is what you expected.

Using @ will suppress errors.

@$dom->loadXml($myXml);

I can use the str_replace to encode all the &, but if I do that with < or > I'm doing it for valid XML tags too.

As a strictly temporary fixup measure you can replace the ones that aren't part of what looks like a tag or entity reference, eg.:

$str= preg_replace('<(?![a-zA-Z_!?])', '&lt;', $str);
$str= preg_replace('&(?!([a-zA-Z]+|#[0-9]+|#x[0-9a-fA-F]+);)', '&amp;', $str);

However this isn't watertight and in the longer term you need to fix whatever is generating this bogus markup, or shout at the person who needs to fix it until they get a clue. Grossly-non-well-formed XML like this is simply not XML by definition.

Put all your text inside CDATA elements?

<!-- Old -->
<blah>
    x & y < 3
</blah>

<!-- New -->
<blah><![CDATA[
    x & y < 3
]]></blah>
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!