String from NSInputStream is not valid utf8. How to convert to utf8 more 'lossy'

吃可爱长大的小学妹 提交于 2021-02-11 06:31:52

问题


I have an App that reads data from a server. Now and then, the data appears to be not valid UTF-8. If I convert from the byte array to an UTF8-String, the string appears nil. There must be some invalid not-UTF8 character in the byte array. Is there a way to 'lossy' convert the byte array to UTF8 and filter out only the invalid characters?

Any ideas?

My code looks like this:

- (void)stream:(NSStream *)theStream handleEvent:(NSStreamEvent)streamEvent {

switch (streamEvent){
    case NSStreamEventHasBytesAvailable:
    {
        uint8_t buffer[1024];
        int len;
        NSMutableData * inputData = [NSMutableData data];
        while ([directoryStream hasBytesAvailable]){
            len = [directoryStream read:buffer maxLength:sizeof(buffer)];
            if (len> 0) {
                [inputData appendBytes:(const void *)buffer length:len];
            }
        }
        NSString *directoryString = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
    }
    NSLog(@"directoryString: %@", directoryString);

    ...

Is there a way to do this conversion in a more 'lossy' way?

As you see I first append the chunks of data to an NSData value and do the conversion to utf8 when everything is read. This prevents that the (multi-byte) utf8 characters are split up resulting in even more invalid (empty) utf8 strings.


回答1:


It works! By combining the code snippet from Larme and the comment about the size of UTF-8 characters I managed to create a 'lossy' NSData to UTF-8 NSString conversion method.

+ (NSString *) data2UTF8String:(NSData *) data {

    // First try to do the 'standard' UTF-8 conversion 
    NSString * bufferStr = [[NSString alloc] initWithData:data
                                                 encoding:NSUTF8StringEncoding];

    // if it fails, do the 'lossy' UTF8 conversion
    if (!bufferStr) {
        const Byte * buffer = [data bytes];

        NSMutableString * filteredString = [[NSMutableString alloc] init];

        int i = 0;
        while (i < [data length]) {

            int expectedLength = 1;

            if      ((buffer[i] & 0b10000000) == 0b00000000) expectedLength = 1;
            else if ((buffer[i] & 0b11100000) == 0b11000000) expectedLength = 2;
            else if ((buffer[i] & 0b11110000) == 0b11100000) expectedLength = 3;
            else if ((buffer[i] & 0b11111000) == 0b11110000) expectedLength = 4;
            else if ((buffer[i] & 0b11111100) == 0b11111000) expectedLength = 5;
            else if ((buffer[i] & 0b11111110) == 0b11111100) expectedLength = 6;

            int length = MIN(expectedLength, [data length] - i);
            NSData * character = [NSData dataWithBytes:&buffer[i] length:(sizeof(Byte) * length)];

            NSString * possibleString = [NSString stringWithUTF8String:[character bytes]];
            if (possibleString) {
                [filteredString appendString:possibleString];
            }
            i = i + expectedLength;
        }
        bufferStr = filteredString;
    }

    return bufferStr;
}

If you have any comments, please let me know. Thanks Larme!



来源:https://stackoverflow.com/questions/30372870/string-from-nsinputstream-is-not-valid-utf8-how-to-convert-to-utf8-more-lossy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!