How to convert a unichar value to an NSString in Objective-C?

前端 未结 5 1405
无人及你
无人及你 2020-11-29 00:32

I\'ve got an international character stored in a unichar variable. This character does not come from a file or url. The variable itself only stores an unsigned short(0xce91)

相关标签:
5条回答
  • 2020-11-29 01:09
    unichar greekAlpha = 0x0391;
    NSString* s = [NSString stringWithCharacters:&greekAlpha length:1];
    

    And now you can incorporate that NSString into another in any way you like. Do note, however, that it is now legal to type a Greek alpha directly into an NSString literal.

    0 讨论(0)
  • 2020-11-29 01:19

    The above answer is great but doesn't account for UTF-8 characters longer than 16 bits, e.g. the ellipsis symbol - 0xE2,0x80,0xA6. Here's a tweak to the code:

    if (utf8char > 65535) {
       chars[0] = (utf8char >> 16) & 255;
       chars[1] = (utf8char >> 8) & 255;
       chars[2] = utf8char & 255; 
       chars[3] = 0x00;
    } else if (utf8char > 127) {
        chars[0] = (utf8char >> 8) & 255;
        chars[1] = utf8char & 255; 
        chars[2] = 0x00;
    } else {
        chars[0] = utf8char;
        chars[1] = 0x00;
    }
    NSString *string = [[[NSString alloc] initWithUTF8String:chars] autorelease];
    

    Note the different string initialisation method which doesn't require a length parameter.

    0 讨论(0)
  • 2020-11-29 01:20

    Since 0xce91 is in the UTF-8 format and %C expects it to be in UTF-16 a simple solution like the one above won't work. For stringWithFormat:@"%C" to work you need to input 0x391 which is the UTF-16 unicode.

    In order to create a string from the UTF-8 encoded unichar you need to first split the unicode into it's octets and then use initWithBytes:length:encoding.

    unichar utf8char = 0xce91; 
    char chars[2];
    int len = 1;
    
    if (utf8char > 127) {
        chars[0] = (utf8char >> 8) & (1 << 8) - 1;
        chars[1] = utf8char & (1 << 8) - 1; 
        len = 2;
    } else {
        chars[0] = utf8char;
    }
    
    NSString *string = [[NSString alloc] initWithBytes:chars
                                                length:len 
                                              encoding:NSUTF8StringEncoding];
    
    0 讨论(0)
  • 2020-11-29 01:23

    The code above is the moral equivalent of unichar foo = 'abc';.

    The problem is that 'Α' doesn't map to a single byte in the "execution character set" (I'm assuming UTF-8) which is "implementation-defined" in C99 §6.4.4.4 10:

    The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined.

    One way is to make 'ab' equal to 'a'<<8|b. Some Mac/iOS system headers rely on this for things like OSType/FourCharCode/FourCC; the only one in iOS that comes to mind is CoreVideo pixel formats. This is, however, unportable.

    If you really want a unichar literal, you can try L'A' (technically it's a wchar_t literal, but on OS X and iOS, wchar_t is typically UTF-16 so it'll work for things inside the BMP). However, it's far simpler to just use @"Α" (which works as long as you set the source character encoding correctly) or @"\u0391" (which has worked since at least the iOS 3 SDK).

    0 讨论(0)
  • 2020-11-29 01:33

    Here is an algorithm for UTF-8 encoding on a single character:

    if (utf8char<0x80){ 
        chars[0] = (utf8char>>0)  & (0x7F | 0x00);
        chars[1] = 0x00;
        chars[2] = 0x00;
        chars[3] = 0x00;
    }
    else if (utf8char<0x0800){
        chars[0] = (utf8char>>6)  & (0x1F | 0xC0);
        chars[1] = (utf8char>>0)  & (0x3F | 0x80);
        chars[2] = 0x00;
        chars[3] = 0x00;
    }
    else if (utf8char<0x010000) {
        chars[0] = (utf8char>>12) & (0x0F | 0xE0);
        chars[1] = (utf8char>>6)  & (0x3F | 0x80);
        chars[2] = (utf8char>>0)  & (0x3F | 0x80);
        chars[3] = 0x00;
    }
    else if (utf8char<0x110000) {
        chars[0] = (utf8char>>18) & (0x07 | 0xF0);
        chars[1] = (utf8char>>12) & (0x3F | 0x80);
        chars[2] = (utf8char>>6)  & (0x3F | 0x80);
        chars[3] = (utf8char>>0)  & (0x3F | 0x80);
    }
    
    0 讨论(0)
提交回复
热议问题