NSString - Convert to pure alphabet only (i.e. remove accents+punctuation)

前端 未结 13 1485
暖寄归人
暖寄归人 2020-12-02 15:49

I\'m trying to compare names without any punctuation, spaces, accents etc. At the moment I am doing the following:

-(NSString*) prepareString:(NSString*)a {
         


        
13条回答
  •  难免孤独
    2020-12-02 16:15

    One important precision over the answer of BillyTheKid18756 (that was corrected by Luiz but it was not obvious in the explanation of the code):

    DO NOT USE stringWithCString as a second step to remove accents, it can add unwanted characters at the end of your string as the NSData is not NULL-terminated (as stringWithCString expects it). Or use it and add an additional NULL byte to your NSData, like Luiz did in his code.

    I think a simpler answer is to replace:

    NSString *sanitizedText = [NSString stringWithCString:[sanitizedData bytes] encoding:NSASCIIStringEncoding];
    

    By:

    NSString *sanitizedText = [[[NSString alloc] initWithData:sanitizedData encoding:NSASCIIStringEncoding] autorelease];
    

    If I take back the code of BillyTheKid18756, here is the complete correct code:

    // The input text
    NSString *text = @"BûvérÈ!@$&%^&(*^(_()-*/48";
    
    // Defining what characters to accept
    NSMutableCharacterSet *acceptedCharacters = [[NSMutableCharacterSet alloc] init];
    [acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet letterCharacterSet]];
    [acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet decimalDigitCharacterSet]];
    [acceptedCharacters addCharactersInString:@" _-.!"];
    
    // Turn accented letters into normal letters (optional)
    NSData *sanitizedData = [text dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
    // Corrected back-conversion from NSData to NSString
    NSString *sanitizedText = [[[NSString alloc] initWithData:sanitizedData encoding:NSASCIIStringEncoding] autorelease];
    
    // Removing unaccepted characters
    NSString* output = [[sanitizedText componentsSeparatedByCharactersInSet:[acceptedCharacters invertedSet]] componentsJoinedByString:@""];
    

提交回复
热议问题