Normalize a string except ñ

时光总嘲笑我的痴心妄想 提交于 2019-12-12 15:09:27

问题


I have the following example code:

String n = "Péña";
n = Normalizer.normalize(n, Normalizer.Form.NFC);

How do I normalize the string n excepting the ñ?

And not only that string, I'm making a form and I want to keep just the ñ's, and everything else without diacritics.


回答1:


Replace all occurrences of "ñ" with a non-printable character "\001", so "Péña" becomes "Pé\001a". Then call Normalizer.normalize() to decompose the "é" into "e" and a separate diacritical mark. Finally remove the diacritical marks, and convert the non-printable character back to "ñ".

String partiallyNormalize(String string)
{
    string = string.replace('ñ', '\001');
    string = Normalizer.normalize(string, Normalizer.Form.NFD);
    string = string.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "");
    string = string.replace('\001', 'ñ');
    return string;
}

You might also want to upvote the preferred answer to Easy way to remove UTF-8 accents from a string?, where I learned how to remove the diacritical marks.



来源:https://stackoverflow.com/questions/36098063/normalize-a-string-except-%c3%b1

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!