converting an NSString with accented characters to a CString

一曲冷凌霜 提交于 2019-12-09 18:27:25

问题


I have an NSString with a value of Jose (an accent on the e). I try to convert it to a C string as follows:

char str [[myAccentStr length] + 1];
[myAccentStr getCString:str maxLength:[myAccentStr length] + 1 encoding:NSUTF32StringEncoding];

but str ends up being an empty string. What gives? I tried UTF8 and UTF16 too. It gets passed to another function later on and when that funcsion calls lstrlen on it, the size comes out as zero.


回答1:


The docs for NSString getCString:maxLength:encoding says:

You can use canBeConvertedToEncoding: to check whether a string can be losslessly converted to encoding. If it can’t, you can use dataUsingEncoding:allowLossyConversion: to get a C-string representation using encoding, allowing some loss of information (note that the data returned by dataUsingEncoding:allowLossyConversion: is not a strict C-string since it does not have a NULL terminator).

Using the NSString method dataUsingEncoding:allowLossyConversion: does the trick. Here's a code example:

NSString *myAccentStr = @"José";
char str[[myAccentStr length] + 1];

// NSString * to C String (char*)
NSData *strData = [myAccentStr dataUsingEncoding:NSMacOSRomanStringEncoding 
                                allowLossyConversion:YES];
memcpy(str, [strData bytes], [strData length] + 1);
str[[myAccentStr length]] = '\0';
NSLog(@"str (from NSString* to c string): %s", str);

// C String (char*) to NSString *   
NSString *newAccentStr = [NSString stringWithCString:str 
                                            encoding:NSMacOSRomanStringEncoding];
NSLog(@"newAccentStr (from c string to NSString*):  %@", newAccentStr);

The output from that NSLog is:

str (from NSString* to c string): José

newAccentStr (from c string to NSString*): José

So far I've only seen this work properly when using the NSMacOSRomanStringEncoding.


Edit

Changing this to a community wiki. Please feel free to edit.

hooleyhoop had some great points, so I thought I would try to make code that is as verbose as possible. If I'm missing anything, someone please chime in.

Also - Not sure why [NSString canBeConvertedToEncoding:] is returning YES even though the [NSString getCString:maxLength:encoding:] function definitely isn't working right (as seen by the output).

Here's some code to help in analyzing what works / what doesn't:

// Define Block variable to tests out different encodings
void (^tryGetCStringUsingEncoding)(NSString*, NSStringEncoding) = ^(NSString* originalNSString, NSStringEncoding encoding) {
    NSLog(@"Trying to convert \"%@\" using encoding: 0x%X", originalNSString, encoding);
    BOOL canEncode = [originalNSString canBeConvertedToEncoding:encoding];
    if (!canEncode)
    {
        NSLog(@"    Can not encode \"%@\" using encoding %X", originalNSString, encoding);
    }
    else
    {
        // Try encoding using NSString getCString:maxLength:encoding:
        NSUInteger cStrLength = [originalNSString lengthOfBytesUsingEncoding:encoding];
        char cstr[cStrLength];
        [originalNSString getCString:cstr maxLength:cStrLength encoding:encoding];
        NSLog(@"    Converted(1): \"%s\"  (expected length: %u)",
              cstr, cStrLength);

        // Try encoding using NSString dataUsingEncoding:allowLossyConversion:          
        NSData *strData = [originalNSString dataUsingEncoding:encoding allowLossyConversion:YES];
        char cstr2[[strData length] + 1];
        memcpy(cstr2, [strData bytes], [strData length] + 1);
        cstr2[[strData length]] = '\0';
        NSLog(@"    Converted(2): \"%s\"  (expected length: %u)",
              cstr2, [strData length]);
    }
};

NSString *myAccentStr = @"José";

// Try out whatever encoding you want
tryGetCStringUsingEncoding(myAccentStr, NSUTF8StringEncoding);
tryGetCStringUsingEncoding(myAccentStr, NSUTF16StringEncoding);
tryGetCStringUsingEncoding(myAccentStr, NSUTF32StringEncoding);
tryGetCStringUsingEncoding(myAccentStr, NSMacOSRomanStringEncoding);

Results:

> Trying to convert "José" using encoding: 0x4
>     Converted(1): ""  (expected length: 5)
>     Converted(2): "José"  (expected length: 5)
> Trying to convert "José" using encoding: 0xA
>     Converted(1): ""  (expected length: 8)
>     Converted(2): "ˇ˛J"  (expected length: 10)
> Trying to convert "José" using encoding: 0x8C000100
>     Converted(1): ""  (expected length: 16)
>     Converted(2): "ˇ˛"  (expected length: 20)
> Trying to convert "José" using encoding: 0x1E
>     Converted(1): "-"  (expected length: 4)
>     Converted(2): "José"  (expected length: 4)



回答2:


[aString length] returns the number of characters. In your case this is 4.

You can convert your string to a c string accurately using, for example, NSUTF8StringEncoding, NSUTF16StringEncoding, NSUTF32StringEncoding. The length in bytes would be 5, 8, 16 respectively.

NSString *myAccentStr = @"José";
NSUInteger l1 = [myAccentStr lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
NSUInteger l2 = [myAccentStr lengthOfBytesUsingEncoding:NSUTF16StringEncoding];
NSUInteger l3 = [myAccentStr lengthOfBytesUsingEncoding:NSUTF32StringEncoding];
NSLog(@"%ld %ld %ld", (long)l1, (long)l2, (long)l3);

> 5, 8, 16

For conversion purposes you should use -maximumLengthOfBytesUsingEncoding instead of -lengthOfBytesUsingEncoding

Always check that the conversion is valid with -canBeConvertedToEncoding

There are good reasons to use NSString



来源:https://stackoverflow.com/questions/7354627/converting-an-nsstring-with-accented-characters-to-a-cstring

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!