问题
I like to use this piece of code when I want to reverse a string. [When I am not using std::string or other inbuilt functions in C]. As a beginner when I initially thought of this I had ASCII table in mind. I think this can work well with Unicode too. I assumed since the difference in values (ASCII etc) is fixed, so it works.
Are there any character encodings in which this code may not work?
char a[11],t;
int len,i;
strcpy(a,"Particl");
printf("%s\n",a);
len = strlen(a);
for(i=0;i<(len/2);i++)
{
a[i] += a[len-1-i];
a[len-1-i] = a[i] - a[len-1-i];
a[i] -= a[len-1-i];
}
printf("%s\n",a);
Update:
This link is informative in association with this question.
回答1:
This will not work with any encoding in which some (not necessarily all) codepoints require more than one char unit to represent, because you are reversing byte-by-byte instead of codepoint-by-codepoint. For the usual 8-bit char this includes all encodings that can represent all of Unicode.
For example: in UTF-16BE, the string "hello" maps to the byte sequence 00 68 00 65 00 6c 00 6c 00 6f. Your algorithm applied to this byte sequence will produce the sequence 6f 00 6c 00 6c 00 65 00 68 00, which is the UTF-16BE encoding of the string "漀氀氀攀栀".
It gets worse -- doing a codepoint-by-codepoint reversal of a Unicode string still won't produce the correct results in all cases, because Unicode has many codepoints that act on their surroundings rather than standing alone as characters. As a trivial example, codepoint-reversing the string "Spın̈al Tap", which contains U+0308 COMBINING DIAERESIS, will produce "paT länıpS" -- see how the diaeresis has migrated from the N to the A? The consequences of codepoint-by-codepoint reversal on a string containing bidirectional overrides or conjoining jamo would be even more dire.
来源:https://stackoverflow.com/questions/16547194/character-encoding-independent-character-swap