How to parse XML with unescaped ampersand

五迷三道 提交于 2019-12-31 07:00:10

问题


I have to read large (about 200MB) XML file, I'am using xmlreader with PHP. There is node URL with unescaped ampersand in it. Parsing always stops on first url NODE. I'm using encoding windows-1250 same as is specified in xml tag of XML file.

Iam getting error: parser error : EntityRef: expecting ';' in

Is it possible to parse an XML with & in value of NODE ?

Thank you for any tips, I can share a code if you need.


回答1:


Is it possible to parse an XML with & in value of NODE ?

No, that means the file is not well-formed XML at all therefore does not really qualify as an XML file and no XML file parser can deal with that otherwise it would not be an XML parser.

However you can pre-process the data before you pass it to an XML parser and fix the issue (& -> &) your own.




回答2:


@hakre is correct. In order for any XML to be parsed, you would have to pre-process the data first. The reason for this is that in XML, the "&" is used for entities only. For example, if you are using XML, the opening '<' and closing '>' are very important, and the following node just doesn't make any sense to a parser:

<object>This object is > than the other object</object>

The parser thinks that the ">" in the middle of the text is trying to close a tag somewhere, but there is no matching opening tag, so it would get confused. To do so, you would need to type the following:

<object>This object is &gt; than the other object</object>

Other entities include: &lt; and &amp;.



来源:https://stackoverflow.com/questions/15142371/how-to-parse-xml-with-unescaped-ampersand

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!