Problem with simpleXML and entity not being defined

左心房为你撑大大i 提交于 2019-12-18 09:26:24

问题


I'm trying to parse a XML file, but when loading it simpleXML prints the following warning:

Warning: simplexml_load_file() [function.simplexml-load-file]: gpr_545.xml:55: parser error : Entity 'Oslash' not defined in import.php on line 35

This is that line:

<forenames>B&Oslash;IE</forenames><x> </x>

As it is a warning, I might ignore it, but I'd like to understand what is happening.


回答1:


HTML Encoding of Latin1 characters (like Ø, what that character describes) is what has broken the XML parser. If you're in control of the data, you need to escape it using XML style character encoding (Ø just happens to be & #216;)




回答2:


HTML-entities like &Oslash is not the same as XML-entities. Here's a table for replacing HTML-entities to XML-entities.

As I can tell from one of your comments to another post, you're having trouble with an entity &sol;. I don't know if this even is a valid HTML-entity, my Firefox won't show the character - only ouputs the entity name. But I found an other table for most entities and their character reference number. Try adding them to your replace-table and you should be safe. &sol;'s reference number is / by the way.




回答3:


I think this is an encoding problem. php, simplexml in this particular case, does not like the danish O you've got in that fornames tag. You could try to encode the whole file in utf-8 and removing the escaped version from the tag by that. Aferwards you can read a fully escaped character free file into simplexml.

K




回答4:


Just had a very similar problem and solved it in the following way. The main idea was to load a file into a string, replace all bad entities on something like "[[entity]]Oslash;" and carry out reverse replacement before displaying some xml node.

function readXML($filename){
    $xml_string = implode("", file($filename));
    $xml_string = str_replace("&", "[[entity]]", $xml_string);
    return simplexml_load_string($xml_string);
}
function xml2str($xml){
    $str = str_replace("[[entity]]", "&", (string)$xml);
    $str = iconv("UTF-8", "WINDOWS-1251", $str);
    return $str;
}
$xml = readXML($filename);
echo xml2str($xml->forenames);

iconv("UTF-8", "WINDOWS-1251", $str) as I have "WINDOWS-1251" encoding on my page




回答5:


Try to use this line:

<forenames><![CDATA[B&Oslash;IE]]></forenames><x> </x>

and read this about CDATA



来源:https://stackoverflow.com/questions/1426852/problem-with-simplexml-and-entity-not-being-defined

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!