Is there a way to use NSString stringByFoldingWithOptions to unfold the single French 'œ' character into 'oe'?

爱⌒轻易说出口 提交于 2019-11-28 05:10:47

问题


For a diacritics-agnostic full text search feature, I use the following code to convert accented characters like é or Ö into their lowercase non-accented form e and o

[[inputString stringByFoldingWithOptions: 
    NSCaseInsensitiveSearch
    + NSDiacriticInsensitiveSearch
    + NSWidthInsensitiveSearch
locale: [NSLocale currentLocale]] lowercaseString];

This works. However, I found no way to convert special characters whose base form consists of multiple characters like the French œ (as in "sœur") or the German ß (as in 'Fluß'). I would like to convert them into oe and ss respectively. I found no flag for stringByFoldingWithOptions and did not find anything on the web.

EDIT

ß is actually handled correctly by the above code. It converts to ss.


回答1:


From worst to best solution.

Solution 1 will work only for æ and ß and fails for everything else (œ, ij, , , , , , , , ...):

NSString *result = [[[NSString alloc] initWithData:[inputString dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES] encoding:NSASCIIStringEncoding] autorelease];

Solution 2 will work for most ligatures and only fails for æ, œ and ij. I've tried all possible NSLocale, so it's not the issue here:

NSString *result = [inputString stringByFoldingWithOptions:NSCaseInsensitiveSearch | NSDiacriticInsensitiveSearch | NSWidthInsensitiveSearch locale:[NSLocale currentLocale]];

Solution 3 will work for most ligatures and only fails for œ:

NSString *result = [[[NSString alloc] initWithData:[[inputString precomposedStringWithCompatibilityMapping] dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES] encoding:NSASCIIStringEncoding] autorelease];

Which means œ will always need to be manually handled. And best solution is to combine either solution 2 or 3 with a manual string replacement.

Solution 2bis:

inputString = [inputString stringByReplacingOccurrencesOfString:@"æ" withString:@"ae" options:NSCaseInsensitiveSearch range:NSMakeRange(0, [inputString length])];
inputString = [inputString stringByReplacingOccurrencesOfString:@"œ" withString:@"oe" options:NSCaseInsensitiveSearch range:NSMakeRange(0, [inputString length])];
inputString = [inputString stringByReplacingOccurrencesOfString:@"ij" withString:@"ij" options:NSCaseInsensitiveSearch range:NSMakeRange(0, [inputString length])];
NSString *result = [inputString stringByFoldingWithOptions:NSCaseInsensitiveSearch | NSDiacriticInsensitiveSearch | NSWidthInsensitiveSearch locale:[NSLocale currentLocale]];

Solution 3bis:

inputString = [inputString stringByReplacingOccurrencesOfString:@"Œ" withString:@"OE"];
inputString = [inputString stringByReplacingOccurrencesOfString:@"œ" withString:@"oe"];
NSString *result = [[[NSString alloc] initWithData:[[inputString precomposedStringWithCompatibilityMapping] dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES] encoding:NSASCIIStringEncoding] autorelease];

Knowing I might be missing some replacements with solution 2bis and NSLocale is unpredictable, best solution is 3bis. And also this last solution allows you to keep case sensitivity if you need.




回答2:


Take a look at CFStringTransform() and its kCFStringTransformToLatin option. I think that may do what you're looking for.



来源:https://stackoverflow.com/questions/10080613/is-there-a-way-to-use-nsstring-stringbyfoldingwithoptions-to-unfold-the-single-f

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!