XML Invalid characters when creating CData node from UnicodeString

后端 未结 3 1343
迷失自我
迷失自我 2021-01-07 01:51

IDE: Embarcadero XE5 c++ builder.

I\'m trying to dump UnicodeStrings in XML CData sections.

Small extract of such a string:

3条回答
  •  Happy的楠姐
    2021-01-07 02:07

    For my situation I created a function to trim a string to just the set of valid XML Characters.

    Pseudocode:

    //Code released into public domain. No attribution required.
    function TrimToXmlText(xmlText: String): string;
    begin
       /*
          http://www.w3.org/TR/xml/#NT-Char
    
          Regarless of entity encoding, the only valid characters allowed are:
    
             Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
    
          I.e. any Unicode character, excluding the surrogate blocks, FFFE, and FFFF.
          This means that a string such as
    
             "Line one"#31#10"Line two"
    
          is invalid (because of the #31 aka 0x1F).
    
          This means we need to manually strip them out; because the xml library certainly won't do it for us.
       */
    
       SetLength(Result, Length(xmlText));
    
       Int32 o = 0;
       for i = 1 to Length(s) do
       begin
          case Ord(s[i]) of
          $9, $A, $D,
          $20..$D7FF,
          $E000..$FFFD:
             begin
                o = o+1;
                Result[o] = xmlText[i];
             end;
          end;
       end;
    
       SetLength(Result, o);
    end;
    

提交回复
热议问题