loading xml document fails with special character »

蓝咒 提交于 2019-12-02 02:39:07

» is an HTML named entity and is not supported in XML. Out of the box, XML only supports &, ', ", > and <.

Use the corresponding numeric entity » (or hexadecimal ») instead.

+1 what Frédéric said. You can also serve » as a raw unescaped character, presumably encoded in UTF-8.

If it's someone else's RSS feed, you need to kick them to stop producing malformed XML; no XML parser will read this.

In a <description> element, the HTML content should normally be XML-escaped. So if the description of the item is This is a <em>really</em> interesting article, it should appear in the XML as:

<description>This is a &lt;em>really&lt;/em> interesting article</description>

Consequently, an HTML-encoded » character should have come out as

&amp;raquo;

If it was included directly from an HTML source without being escaped, that's a more serious XML-injection problem.

(This is assuming RSS 2.0. In the various earlier versions of RSS, whether the <description> contained HTML or plain text varied from spec to spec and was sometimes completely unspecified. For old RSS versions it's not really reliable to use HTML content at all.)

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!