HTML character decoding in Objective-C / Cocoa Touch

后端 未结 13 2226
我寻月下人不归
我寻月下人不归 2020-11-22 10:24

First of all, I found this: Objective C HTML escape/unescape, but it doesn\'t work for me.

My encoded characters (come from a RSS feed, btw) look like this: &a

13条回答
  •  再見小時候
    2020-11-22 10:55

    The one by Daniel is basically very nice, and I fixed a few issues there:

    1. removed the skipping character for NSSCanner (otherwise spaces between two continuous entities would be ignored

      [scanner setCharactersToBeSkipped:nil];

    2. fixed the parsing when there are isolated '&' symbols (I am not sure what is the 'correct' output for this, I just compared it against firefox):

    e.g.

        &#ABC DF & B'  & C' Items (288)
    

    here is the modified code:

    - (NSString *)stringByDecodingXMLEntities {
        NSUInteger myLength = [self length];
        NSUInteger ampIndex = [self rangeOfString:@"&" options:NSLiteralSearch].location;
    
        // Short-circuit if there are no ampersands.
        if (ampIndex == NSNotFound) {
            return self;
        }
        // Make result string with some extra capacity.
        NSMutableString *result = [NSMutableString stringWithCapacity:(myLength * 1.25)];
    
        // First iteration doesn't need to scan to & since we did that already, but for code simplicity's sake we'll do it again with the scanner.
        NSScanner *scanner = [NSScanner scannerWithString:self];
    
        [scanner setCharactersToBeSkipped:nil];
    
        NSCharacterSet *boundaryCharacterSet = [NSCharacterSet characterSetWithCharactersInString:@" \t\n\r;"];
    
        do {
            // Scan up to the next entity or the end of the string.
            NSString *nonEntityString;
            if ([scanner scanUpToString:@"&" intoString:&nonEntityString]) {
                [result appendString:nonEntityString];
            }
            if ([scanner isAtEnd]) {
                goto finish;
            }
            // Scan either a HTML or numeric character entity reference.
            if ([scanner scanString:@"&" intoString:NULL])
                [result appendString:@"&"];
            else if ([scanner scanString:@"'" intoString:NULL])
                [result appendString:@"'"];
            else if ([scanner scanString:@""" intoString:NULL])
                [result appendString:@"\""];
            else if ([scanner scanString:@"<" intoString:NULL])
                [result appendString:@"<"];
            else if ([scanner scanString:@">" intoString:NULL])
                [result appendString:@">"];
            else if ([scanner scanString:@"&#" intoString:NULL]) {
                BOOL gotNumber;
                unsigned charCode;
                NSString *xForHex = @"";
    
                // Is it hex or decimal?
                if ([scanner scanString:@"x" intoString:&xForHex]) {
                    gotNumber = [scanner scanHexInt:&charCode];
                }
                else {
                    gotNumber = [scanner scanInt:(int*)&charCode];
                }
    
                if (gotNumber) {
                    [result appendFormat:@"%C", (unichar)charCode];
    
                    [scanner scanString:@";" intoString:NULL];
                }
                else {
                    NSString *unknownEntity = @"";
    
                    [scanner scanUpToCharactersFromSet:boundaryCharacterSet intoString:&unknownEntity];
    
    
                    [result appendFormat:@"&#%@%@", xForHex, unknownEntity];
    
                    //[scanner scanUpToString:@";" intoString:&unknownEntity];
                    //[result appendFormat:@"&#%@%@;", xForHex, unknownEntity];
                    NSLog(@"Expected numeric character entity but got &#%@%@;", xForHex, unknownEntity);
    
                }
    
            }
            else {
                NSString *amp;
    
                [scanner scanString:@"&" intoString:&];  //an isolated & symbol
                [result appendString:amp];
    
                /*
                NSString *unknownEntity = @"";
                [scanner scanUpToString:@";" intoString:&unknownEntity];
                NSString *semicolon = @"";
                [scanner scanString:@";" intoString:&semicolon];
                [result appendFormat:@"%@%@", unknownEntity, semicolon];
                NSLog(@"Unsupported XML character entity %@%@", unknownEntity, semicolon);
                 */
            }
    
        }
        while (![scanner isAtEnd]);
    
    finish:
        return result;
    }
    

提交回复
热议问题