Are supplementary characters allowed in XML names?

て烟熏妆下的殇ゞ 提交于 2021-01-27 12:48:07

问题


According to the specification the characters [#x10000-#xEFFFF] are legal in XML names. However, the W3 validator says that this XML is not well-formed:

<?xml version="1.0"?>
<𐐀>value</𐐀>

(the name of the attribute is a Unicode character #x10400). Some browsers, like Firefox, also complain about it (Chrome displays XML, IE shows a blank page). Is it an error in tools or the XML is really not well-formed?


回答1:


Is it an error in tools or the XML is really not well-formed?

It's well formed in the latest specification, which is XML 1.0 Fifth Edition. But it was not well-formed in the previous edition, which was current until 2008.

The original XML 1.0 spec (from 1998) locked down the set of name characters to the characters that were defined as letters in the Unicode standard of the time. That didn't include 𐐀 which only came in with Unicode 3.1 a few years later.

XML 1.1 was much looser about what characters it would accept in names (largely for this reason, to allow characters from future Unicode versions), and this is a Good Thing. However XML 1.1 has never really caught on, so the Editors decided to backport the newer, more permissive namechar rules from there to 1.0. This was controversial and all in all probably not a Good Thing.

This means you can use 𐐀 in names in XML 1.0 documents and be usable by a subset of parsers that have updated for Fifth Edition (or never implemented the strict rules in the first place), or you can use them in XML 1.1 documents and be usable by a different set of parsers that support XML 1.1.

Or, more realistically, you can avoid those characters which are sort-of-well-formed-depending altogether, and feel a little sad.




回答2:


Yes, supplementary characters are allowed in XML names.

Your XML is well-formed because the element name uses characters allowed by the Name production in the W3C XML Recommendation.

However:

  • Online validators that get the file from you over HTTP will have to take care to mind the character encoding. It appears that by the time the W3C Markup Validation Service gets your XML, your character is getting lost in an encoding shuffle:

    Warning Missing "charset" attribute for "text/xml" document.

    The HTTP Content-Type header (text/xml) sent by your web browser (Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36) did not contain a "charset" parameter, but the Content-Type was one of the XML text/* sub-types.

    The relevant specification (RFC 3023) specifies a strong default of "us-ascii" for such documents so we will use this value regardless of any encoding you may have indicated elsewhere.

    If you would like to use a different encoding, you should arrange to have your browser send this new encoding information.

    Try an offline XML parser. My Xerces-J-based validator, for example, correctly identifies your XML as being well-formed.

  • Be aware that not all characters allowed by NAME are allowed in NCNAMEs. So, although well-formed, XML using such characters cannot be valid according to an XSD where such names are not allowed.



来源:https://stackoverflow.com/questions/38919409/are-supplementary-characters-allowed-in-xml-names

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!