Easy way to remove accents from a Unicode string? [duplicate]

本秂侑毒 提交于 2019-11-26 07:27:00

问题


I want to change this sentence :

Et ça sera sa moitié.

To :

Et ca sera sa moitie.

Is there an easy way to do this in Java, like I would do in Objective-C ?

NSString *str = @\"Et ça sera sa moitié.\";
NSData *data = [str dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
NSString *newStr = [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding];

回答1:


Finally, I've solved it by using the Normalizer class.

import java.text.Normalizer;

public static String stripAccents(String s) 
{
    s = Normalizer.normalize(s, Normalizer.Form.NFD);
    s = s.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "");
    return s;
}



回答2:


Maybe the easiest and safest way is using StringUtils from Apache Commons Lang

StringUtils.stripAccents(String input)

Removes diacritics (~= accents) from a string. The case will not be altered. For instance, 'à' will be replaced by 'a'. Note that ligatures will be left as is.

StringUtils.stripAccents()




回答3:


I guess the only difference is that I use a + and not a [] compared to the solution. I think both works, but it's better to have it here as well.

String normalized = Normalizer.normalize(input, Normalizer.Form.NFD);
String accentRemoved = normalized.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");



回答4:


Assuming you are using Java 6 or newer, you might want to take a look at Normalizer, which can decompose accents, then use a regex to strip the combining accents.

Otherwise, you should be able to achieve the same result using ICU4J.




回答5:


For kotlin

fun stripAccents(s: String): String 
{
    var string = Normalizer.normalize(s, Normalizer.Form.NFD)
    string = Regex("\\p{InCombiningDiacriticalMarks}+").replace(string, "")
    return  string
}



回答6:


thank you

public static final Pattern DIACRITICS_AND_FRIENDS = Pattern.compile(
                              "[\\p{InCombiningDiacriticalMarks}\\p{IsLm}\\p{IsSk}]+");

private static String stripDiacritics(String str) {
    str = Normalizer.normalize(str, Normalizer.Form.NFD);
    str = DIACRITICS_AND_FRIENDS.matcher(str).replaceAll("");
    return str;
}

=> stripDiacritics("Et Ça sera sa moitié." );



来源:https://stackoverflow.com/questions/15190656/easy-way-to-remove-accents-from-a-unicode-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!