NSString - Convert to pure alphabet only (i.e. remove accents+punctuation)

前端 未结 13 1390
暖寄归人
暖寄归人 2020-12-02 15:49

I\'m trying to compare names without any punctuation, spaces, accents etc. At the moment I am doing the following:

-(NSString*) prepareString:(NSString*)a {
         


        
相关标签:
13条回答
  • 2020-12-02 16:15

    One important precision over the answer of BillyTheKid18756 (that was corrected by Luiz but it was not obvious in the explanation of the code):

    DO NOT USE stringWithCString as a second step to remove accents, it can add unwanted characters at the end of your string as the NSData is not NULL-terminated (as stringWithCString expects it). Or use it and add an additional NULL byte to your NSData, like Luiz did in his code.

    I think a simpler answer is to replace:

    NSString *sanitizedText = [NSString stringWithCString:[sanitizedData bytes] encoding:NSASCIIStringEncoding];
    

    By:

    NSString *sanitizedText = [[[NSString alloc] initWithData:sanitizedData encoding:NSASCIIStringEncoding] autorelease];
    

    If I take back the code of BillyTheKid18756, here is the complete correct code:

    // The input text
    NSString *text = @"BûvérÈ!@$&%^&(*^(_()-*/48";
    
    // Defining what characters to accept
    NSMutableCharacterSet *acceptedCharacters = [[NSMutableCharacterSet alloc] init];
    [acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet letterCharacterSet]];
    [acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet decimalDigitCharacterSet]];
    [acceptedCharacters addCharactersInString:@" _-.!"];
    
    // Turn accented letters into normal letters (optional)
    NSData *sanitizedData = [text dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
    // Corrected back-conversion from NSData to NSString
    NSString *sanitizedText = [[[NSString alloc] initWithData:sanitizedData encoding:NSASCIIStringEncoding] autorelease];
    
    // Removing unaccepted characters
    NSString* output = [[sanitizedText componentsSeparatedByCharactersInSet:[acceptedCharacters invertedSet]] componentsJoinedByString:@""];
    
    0 讨论(0)
  • 2020-12-02 16:15

    To give a complete example by combining the answers from Luiz and Peter, adding a few lines, you get the code below.

    The code does the following:

    1. Creates a set of accepted characters
    2. Turn accented letters into normal letters
    3. Remove characters not in the set

    Objective-C

    // The input text
    NSString *text = @"BûvérÈ!@$&%^&(*^(_()-*/48";
    
    // Create set of accepted characters
    NSMutableCharacterSet *acceptedCharacters = [[NSMutableCharacterSet alloc] init];
    [acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet letterCharacterSet]];
    [acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet decimalDigitCharacterSet]];
    [acceptedCharacters addCharactersInString:@" _-.!"];
    
    // Turn accented letters into normal letters (optional)
    NSData *sanitizedData = [text dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
    NSString *sanitizedText = [NSString stringWithCString:[sanitizedData bytes] encoding:NSASCIIStringEncoding];
    
    // Remove characters not in the set
    NSString* output = [[sanitizedText componentsSeparatedByCharactersInSet:[acceptedCharacters invertedSet]] componentsJoinedByString:@""];
    

    Swift (2.2) example

    let text = "BûvérÈ!@$&%^&(*^(_()-*/48"
    
    // Create set of accepted characters
    let acceptedCharacters = NSMutableCharacterSet()
    acceptedCharacters.formUnionWithCharacterSet(NSCharacterSet.letterCharacterSet())
    acceptedCharacters.formUnionWithCharacterSet(NSCharacterSet.decimalDigitCharacterSet())
    acceptedCharacters.addCharactersInString(" _-.!")
    
    // Turn accented letters into normal letters (optional)
    let sanitizedData = text.dataUsingEncoding(NSASCIIStringEncoding, allowLossyConversion: true)
    let sanitizedText = String(data: sanitizedData!, encoding: NSASCIIStringEncoding)
    
    // Remove characters not in the set
    let components = sanitizedText!.componentsSeparatedByCharactersInSet(acceptedCharacters.invertedSet)
    let output = components.joinWithSeparator("")
    

    Output

    The output for both examples would be: BuverE!_-48

    0 讨论(0)
  • 2020-12-02 16:18
    NSString* finish = [[start componentsSeparatedByCharactersInSet:[[NSCharacterSet letterCharacterSet] invertedSet]] componentsJoinedByString:@""];
    
    0 讨论(0)
  • 2020-12-02 16:18

    I wanted to filter out everything except letters and numbers, so I adapted Lorean's implementation of a Category on NSString to work a little different. In this example, you specify a string with only the characters you want to keep, and everything else is filtered out:

    @interface NSString (PraxCategories)
    + (NSString *)lettersAndNumbers;
    - (NSString*)stringByKeepingOnlyLettersAndNumbers;
    - (NSString*)stringByKeepingOnlyCharactersInString:(NSString *)string;
    @end
    
    
    @implementation NSString (PraxCategories)
    
    + (NSString *)lettersAndNumbers { return @"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"; }
    
    - (NSString*)stringByKeepingOnlyLettersAndNumbers {
        return [self stringByKeepingOnlyCharactersInString:[NSString lettersAndNumbers]];
    }
    
    - (NSString*)stringByKeepingOnlyCharactersInString:(NSString *)string {
        NSCharacterSet *characterSet = [NSCharacterSet characterSetWithCharactersInString:string];
        NSMutableString * mutableString = @"".mutableCopy;
        for (int i = 0; i < [self length]; i++){
            char character = [self characterAtIndex:i];
            if([characterSet characterIsMember:character]) [mutableString appendFormat:@"%c", character];
        }
        return mutableString.copy;
    }
    
    @end
    

    Once you've made your Categories, using them is trivial, and you can use them on any NSString:

    NSString *string = someStringValueThatYouWantToFilter;
    
    string = [string stringByKeepingOnlyLettersAndNumbers];
    

    Or, for example, if you wanted to get rid of everything except vowels:

    string = [string stringByKeepingOnlyCharactersInString:@"aeiouAEIOU"];
    

    If you're still learning Objective-C and aren't using Categories, I encourage you to try them out. They're the best place to put things like this because it gives more functionality to all objects of the class you Categorize.

    Categories simplify and encapsulate the code you're adding, making it easy to reuse on all of your projects. It's a great feature of Objective-C!

    0 讨论(0)
  • 2020-12-02 16:26

    Consider using the RegexKit framework. You could do something like:

    NSString *searchString      = @"This is neat.";
    NSString *regexString       = @"[\W]";
    NSString *replaceWithString = @"";
    NSString *replacedString    = [searchString stringByReplacingOccurrencesOfRegex:regexString withString:replaceWithString];
    
    NSLog (@"%@", replacedString);
    //... Thisisneat
    
    0 讨论(0)
  • 2020-12-02 16:27

    Just bumped into this, maybe its too late, but here is what worked for me:

    // text is the input string, and this just removes accents from the letters
    
    // lossy encoding turns accented letters into normal letters
    NSMutableData *sanitizedData = [text dataUsingEncoding:NSASCIIStringEncoding
                                      allowLossyConversion:YES];
    
    // increase length by 1 adds a 0 byte (increaseLengthBy 
    // guarantees to fill the new space with 0s), effectively turning 
    // sanitizedData into a c-string
    [sanitizedData increaseLengthBy:1];
    
    // now we just create a string with the c-string in sanitizedData
    NSString *final = [NSString stringWithCString:[sanitizedData bytes]];
    
    0 讨论(0)
提交回复
热议问题