IDE: Embarcadero XE5 c++ builder.
I\'m trying to dump UnicodeStrings in XML CData sections.
Small extract of such a string:
For my situation I created a function to trim a string to just the set of valid XML Characters.
Pseudocode:
//Code released into public domain. No attribution required.
function TrimToXmlText(xmlText: String): string;
begin
/*
http://www.w3.org/TR/xml/#NT-Char
Regarless of entity encoding, the only valid characters allowed are:
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
I.e. any Unicode character, excluding the surrogate blocks, FFFE, and FFFF.
This means that a string such as
"Line one"#31#10"Line two"
is invalid (because of the #31 aka 0x1F).
This means we need to manually strip them out; because the xml library certainly won't do it for us.
*/
SetLength(Result, Length(xmlText));
Int32 o = 0;
for i = 1 to Length(s) do
begin
case Ord(s[i]) of
$9, $A, $D,
$20..$D7FF,
$E000..$FFFD:
begin
o = o+1;
Result[o] = xmlText[i];
end;
end;
end;
SetLength(Result, o);
end;