问题
I have an NSString with a value of Jose (an accent on the e). I try to convert it to a C string as follows:
char str [[myAccentStr length] + 1];
[myAccentStr getCString:str maxLength:[myAccentStr length] + 1 encoding:NSUTF32StringEncoding];
but str ends up being an empty string. What gives? I tried UTF8 and UTF16 too. It gets passed to another function later on and when that funcsion calls lstrlen on it, the size comes out as zero.
回答1:
The docs for NSString getCString:maxLength:encoding says:
You can use canBeConvertedToEncoding: to check whether a string can be losslessly converted to encoding. If it can’t, you can use dataUsingEncoding:allowLossyConversion: to get a C-string representation using encoding, allowing some loss of information (note that the data returned by dataUsingEncoding:allowLossyConversion: is not a strict C-string since it does not have a NULL terminator).
Using the NSString method dataUsingEncoding:allowLossyConversion: does the trick. Here's a code example:
NSString *myAccentStr = @"José";
char str[[myAccentStr length] + 1];
// NSString * to C String (char*)
NSData *strData = [myAccentStr dataUsingEncoding:NSMacOSRomanStringEncoding
allowLossyConversion:YES];
memcpy(str, [strData bytes], [strData length] + 1);
str[[myAccentStr length]] = '\0';
NSLog(@"str (from NSString* to c string): %s", str);
// C String (char*) to NSString *
NSString *newAccentStr = [NSString stringWithCString:str
encoding:NSMacOSRomanStringEncoding];
NSLog(@"newAccentStr (from c string to NSString*): %@", newAccentStr);
The output from that NSLog is:
str (from NSString* to c string): José
newAccentStr (from c string to NSString*): José
So far I've only seen this work properly when using the NSMacOSRomanStringEncoding.
Edit
Changing this to a community wiki. Please feel free to edit.
hooleyhoop had some great points, so I thought I would try to make code that is as verbose as possible. If I'm missing anything, someone please chime in.
Also - Not sure why [NSString canBeConvertedToEncoding:] is returning YES even though the [NSString getCString:maxLength:encoding:] function definitely isn't working right (as seen by the output).
Here's some code to help in analyzing what works / what doesn't:
// Define Block variable to tests out different encodings
void (^tryGetCStringUsingEncoding)(NSString*, NSStringEncoding) = ^(NSString* originalNSString, NSStringEncoding encoding) {
NSLog(@"Trying to convert \"%@\" using encoding: 0x%X", originalNSString, encoding);
BOOL canEncode = [originalNSString canBeConvertedToEncoding:encoding];
if (!canEncode)
{
NSLog(@" Can not encode \"%@\" using encoding %X", originalNSString, encoding);
}
else
{
// Try encoding using NSString getCString:maxLength:encoding:
NSUInteger cStrLength = [originalNSString lengthOfBytesUsingEncoding:encoding];
char cstr[cStrLength];
[originalNSString getCString:cstr maxLength:cStrLength encoding:encoding];
NSLog(@" Converted(1): \"%s\" (expected length: %u)",
cstr, cStrLength);
// Try encoding using NSString dataUsingEncoding:allowLossyConversion:
NSData *strData = [originalNSString dataUsingEncoding:encoding allowLossyConversion:YES];
char cstr2[[strData length] + 1];
memcpy(cstr2, [strData bytes], [strData length] + 1);
cstr2[[strData length]] = '\0';
NSLog(@" Converted(2): \"%s\" (expected length: %u)",
cstr2, [strData length]);
}
};
NSString *myAccentStr = @"José";
// Try out whatever encoding you want
tryGetCStringUsingEncoding(myAccentStr, NSUTF8StringEncoding);
tryGetCStringUsingEncoding(myAccentStr, NSUTF16StringEncoding);
tryGetCStringUsingEncoding(myAccentStr, NSUTF32StringEncoding);
tryGetCStringUsingEncoding(myAccentStr, NSMacOSRomanStringEncoding);
Results:
> Trying to convert "José" using encoding: 0x4
> Converted(1): "" (expected length: 5)
> Converted(2): "José" (expected length: 5)
> Trying to convert "José" using encoding: 0xA
> Converted(1): "" (expected length: 8)
> Converted(2): "ˇ˛J" (expected length: 10)
> Trying to convert "José" using encoding: 0x8C000100
> Converted(1): "" (expected length: 16)
> Converted(2): "ˇ˛" (expected length: 20)
> Trying to convert "José" using encoding: 0x1E
> Converted(1): "-" (expected length: 4)
> Converted(2): "José" (expected length: 4)
回答2:
[aString length] returns the number of characters. In your case this is 4.
You can convert your string to a c string accurately using, for example, NSUTF8StringEncoding, NSUTF16StringEncoding, NSUTF32StringEncoding. The length in bytes would be 5, 8, 16 respectively.
NSString *myAccentStr = @"José";
NSUInteger l1 = [myAccentStr lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
NSUInteger l2 = [myAccentStr lengthOfBytesUsingEncoding:NSUTF16StringEncoding];
NSUInteger l3 = [myAccentStr lengthOfBytesUsingEncoding:NSUTF32StringEncoding];
NSLog(@"%ld %ld %ld", (long)l1, (long)l2, (long)l3);
> 5, 8, 16
For conversion purposes you should use -maximumLengthOfBytesUsingEncoding instead of -lengthOfBytesUsingEncoding
Always check that the conversion is valid with -canBeConvertedToEncoding
There are good reasons to use NSString
来源:https://stackoverflow.com/questions/7354627/converting-an-nsstring-with-accented-characters-to-a-cstring