For a diacritics-agnostic full text search feature, I use the following code to convert accented characters like é
or Ö
into their lowercase non-accented form e
and o
[[inputString stringByFoldingWithOptions:
NSCaseInsensitiveSearch
+ NSDiacriticInsensitiveSearch
+ NSWidthInsensitiveSearch
locale: [NSLocale currentLocale]] lowercaseString];
This works. However, I found no way to convert special characters whose base form consists of multiple characters like the French œ
(as in "sœur") or the German ß
(as in 'Fluß'). I would like to convert them into oe
and ss
respectively. I found no flag for stringByFoldingWithOptions and did not find anything on the web.
EDIT
ß
is actually handled correctly by the above code. It converts to ss
.
From worst to best solution.
Solution 1 will work only for æ and ß and fails for everything else (œ, ij, ff, fi, fl, ffi, ffl, ſt, st, ...):
NSString *result = [[[NSString alloc] initWithData:[inputString dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES] encoding:NSASCIIStringEncoding] autorelease];
Solution 2 will work for most ligatures and only fails for æ, œ and ij. I've tried all possible NSLocale, so it's not the issue here:
NSString *result = [inputString stringByFoldingWithOptions:NSCaseInsensitiveSearch | NSDiacriticInsensitiveSearch | NSWidthInsensitiveSearch locale:[NSLocale currentLocale]];
Solution 3 will work for most ligatures and only fails for œ:
NSString *result = [[[NSString alloc] initWithData:[[inputString precomposedStringWithCompatibilityMapping] dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES] encoding:NSASCIIStringEncoding] autorelease];
Which means œ will always need to be manually handled. And best solution is to combine either solution 2 or 3 with a manual string replacement.
Solution 2bis:
inputString = [inputString stringByReplacingOccurrencesOfString:@"æ" withString:@"ae" options:NSCaseInsensitiveSearch range:NSMakeRange(0, [inputString length])];
inputString = [inputString stringByReplacingOccurrencesOfString:@"œ" withString:@"oe" options:NSCaseInsensitiveSearch range:NSMakeRange(0, [inputString length])];
inputString = [inputString stringByReplacingOccurrencesOfString:@"ij" withString:@"ij" options:NSCaseInsensitiveSearch range:NSMakeRange(0, [inputString length])];
NSString *result = [inputString stringByFoldingWithOptions:NSCaseInsensitiveSearch | NSDiacriticInsensitiveSearch | NSWidthInsensitiveSearch locale:[NSLocale currentLocale]];
Solution 3bis:
inputString = [inputString stringByReplacingOccurrencesOfString:@"Œ" withString:@"OE"];
inputString = [inputString stringByReplacingOccurrencesOfString:@"œ" withString:@"oe"];
NSString *result = [[[NSString alloc] initWithData:[[inputString precomposedStringWithCompatibilityMapping] dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES] encoding:NSASCIIStringEncoding] autorelease];
Knowing I might be missing some replacements with solution 2bis and NSLocale is unpredictable, best solution is 3bis. And also this last solution allows you to keep case sensitivity if you need.
Take a look at CFStringTransform()
and its kCFStringTransformToLatin
option. I think that may do what you're looking for.
来源:https://stackoverflow.com/questions/10080613/is-there-a-way-to-use-nsstring-stringbyfoldingwithoptions-to-unfold-the-single-f