I have an NSString with a value of Jose (an accent on the e). I try to convert it to a C string as follows:
char str [[myAccentStr length] + 1];
[myAccentStr getCString:str maxLength:[myAccentStr length] + 1 encoding:NSUTF32StringEncoding];
but str ends up being an empty string. What gives? I tried UTF8 and UTF16 too. It gets passed to another function later on and when that funcsion calls lstrlen on it, the size comes out as zero.
The docs for NSString getCString:maxLength:encoding says:
You can use canBeConvertedToEncoding: to check whether a string can be losslessly converted to encoding. If it can’t, you can use dataUsingEncoding:allowLossyConversion: to get a C-string representation using encoding, allowing some loss of information (note that the data returned by dataUsingEncoding:allowLossyConversion: is not a strict C-string since it does not have a NULL terminator).
Using the NSString method dataUsingEncoding:allowLossyConversion: does the trick. Here's a code example:
NSString *myAccentStr = @"José";
char str[[myAccentStr length] + 1];
// NSString * to C String (char*)
NSData *strData = [myAccentStr dataUsingEncoding:NSMacOSRomanStringEncoding
allowLossyConversion:YES];
memcpy(str, [strData bytes], [strData length] + 1);
str[[myAccentStr length]] = '\0';
NSLog(@"str (from NSString* to c string): %s", str);
// C String (char*) to NSString *
NSString *newAccentStr = [NSString stringWithCString:str
encoding:NSMacOSRomanStringEncoding];
NSLog(@"newAccentStr (from c string to NSString*): %@", newAccentStr);
The output from that NSLog is:
str (from NSString* to c string): José
newAccentStr (from c string to NSString*): José
So far I've only seen this work properly when using the NSMacOSRomanStringEncoding.
Edit
Changing this to a community wiki. Please feel free to edit.
hooleyhoop had some great points, so I thought I would try to make code that is as verbose as possible. If I'm missing anything, someone please chime in.
Also - Not sure why [NSString canBeConvertedToEncoding:] is returning YES even though the [NSString getCString:maxLength:encoding:] function definitely isn't working right (as seen by the output).
Here's some code to help in analyzing what works / what doesn't:
// Define Block variable to tests out different encodings
void (^tryGetCStringUsingEncoding)(NSString*, NSStringEncoding) = ^(NSString* originalNSString, NSStringEncoding encoding) {
NSLog(@"Trying to convert \"%@\" using encoding: 0x%X", originalNSString, encoding);
BOOL canEncode = [originalNSString canBeConvertedToEncoding:encoding];
if (!canEncode)
{
NSLog(@" Can not encode \"%@\" using encoding %X", originalNSString, encoding);
}
else
{
// Try encoding using NSString getCString:maxLength:encoding:
NSUInteger cStrLength = [originalNSString lengthOfBytesUsingEncoding:encoding];
char cstr[cStrLength];
[originalNSString getCString:cstr maxLength:cStrLength encoding:encoding];
NSLog(@" Converted(1): \"%s\" (expected length: %u)",
cstr, cStrLength);
// Try encoding using NSString dataUsingEncoding:allowLossyConversion:
NSData *strData = [originalNSString dataUsingEncoding:encoding allowLossyConversion:YES];
char cstr2[[strData length] + 1];
memcpy(cstr2, [strData bytes], [strData length] + 1);
cstr2[[strData length]] = '\0';
NSLog(@" Converted(2): \"%s\" (expected length: %u)",
cstr2, [strData length]);
}
};
NSString *myAccentStr = @"José";
// Try out whatever encoding you want
tryGetCStringUsingEncoding(myAccentStr, NSUTF8StringEncoding);
tryGetCStringUsingEncoding(myAccentStr, NSUTF16StringEncoding);
tryGetCStringUsingEncoding(myAccentStr, NSUTF32StringEncoding);
tryGetCStringUsingEncoding(myAccentStr, NSMacOSRomanStringEncoding);
Results:
> Trying to convert "José" using encoding: 0x4
> Converted(1): "" (expected length: 5)
> Converted(2): "José" (expected length: 5)
> Trying to convert "José" using encoding: 0xA
> Converted(1): "" (expected length: 8)
> Converted(2): "ˇ˛J" (expected length: 10)
> Trying to convert "José" using encoding: 0x8C000100
> Converted(1): "" (expected length: 16)
> Converted(2): "ˇ˛" (expected length: 20)
> Trying to convert "José" using encoding: 0x1E
> Converted(1): "-" (expected length: 4)
> Converted(2): "José" (expected length: 4)
[aString length] returns the number of characters. In your case this is 4.
You can convert your string to a c string accurately using, for example, NSUTF8StringEncoding, NSUTF16StringEncoding, NSUTF32StringEncoding. The length in bytes would be 5, 8, 16 respectively.
NSString *myAccentStr = @"José";
NSUInteger l1 = [myAccentStr lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
NSUInteger l2 = [myAccentStr lengthOfBytesUsingEncoding:NSUTF16StringEncoding];
NSUInteger l3 = [myAccentStr lengthOfBytesUsingEncoding:NSUTF32StringEncoding];
NSLog(@"%ld %ld %ld", (long)l1, (long)l2, (long)l3);
> 5, 8, 16
For conversion purposes you should use -maximumLengthOfBytesUsingEncoding instead of -lengthOfBytesUsingEncoding
Always check that the conversion is valid with -canBeConvertedToEncoding
There are good reasons to use NSString
来源:https://stackoverflow.com/questions/7354627/converting-an-nsstring-with-accented-characters-to-a-cstring