XML Invalid characters when creating CData node from UnicodeString

后端 未结 3 1351
迷失自我
迷失自我 2021-01-07 01:51

IDE: Embarcadero XE5 c++ builder.

I\'m trying to dump UnicodeStrings in XML CData sections.

Small extract of such a string:

3条回答
  •  情歌与酒
    2021-01-07 02:31

    If you read Section 2.7 of the XML specification, it describes the format of a CDATA section:

    CDATA Sections
    
    [18]    CDSect    ::=    CDStart CData CDEnd  
    [19]    CDStart    ::=    '' Char*))  
    [21]    CDEnd    ::=    ']]>' 
    

    Char is defined in Section 2.2:

    Char    ::=    #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ 
    

    If you look at your raw data, it contains over a dozen character values that are excluded from that range (specifically #x0, #x1, #x2, #x4, #x5, #x6, #x8, #xB #xE, #x18, #x19, #x1A, and #x1C). That is why you are getting errors about illegal characters, because you really do have illegal characters.

    A CDATA section does not give you permission to put arbitrary binary data into an XML data. A CDATA section is meant to be used when text content contains characters that are normally reserved for XML markup, so that they do not have to be escaped or encoded as entities. The only way to put binary data into an XML document is to encode it in an XML-compatible (typically 7bit ASCII) format, such as Base64 (but there are other formats available that you can use, such as yEnc).

提交回复
热议问题