Special characters in XML files - processing with the DOM API

和自甴很熟 提交于 2019-12-05 08:32:44

The reason is simple: The XML file really contains an "&" character.

It is just represented differently (i.e. it is "escaped"), because a real "&" on it's own breaks XML files, as you've seen. Read the relevant section in the XML 1.0 spec: "2.4 Character Data and Markup". It's just a few lines, but it explains the issue quite well.

XML is a representation of data (!). Don't think of it as a text file. Example:

You want to store the string "17 < 20" in an XML file. Initially, you can't, since the "<" is reserved as the opening tag bracket. So this would be invalid:

<xml>17 < 20</xml>

Solution: You employ character escaping on the special/reserved character, just for the means of retaining the validity of the file:

<xml>17 &lt; 20</xml>

For all practical purposes the above snippet contains the following data (in JSON representation this time):

{
  "xml": "17 < 20"
}

This is why you see the real "&" in your post-processing. It had been escaped in just the same way, but it's meaning stayed the same all the time.

The above example also explains why the "&" must be treated specially: It is itself part of the XML escaping mechanism. It marks the start of an escape sequence, like in "&lt;". Therefore it must be escaped itself (with "&amp;", like you've done).

Any XML parser will implicitly translate entities such as &amp;, &lt;, &gt;, into the corresponding characters, as part of the process of parsing the file.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!