How to check if Unicode character has diacritics in .Net?

给你一囗甜甜゛ 提交于 2019-12-04 13:12:02

问题


I am developing a heuristic for automatic language detection and would like to find out whether the given letter has diacritics (like "Ðàäèî Êóëüòóðà" -- all letters have diacritics). It would be best if I could also get the type of diacritic, if possible.

I browsed through UnicodeCategory enum but didn't find anything that could help me here.


回答1:


One possible way is to normalize it to a form where letters and their diacritics are written as several codepoints. Then check if you have a letter followed by accents.

Adapting from How do I remove diacritics (accents) from a string in .NET?, you can normalize with Normalize(NormalizationForm.FormD) and check for the diacritics with UnicodeCategory.NonSpacingMark.

bool IsLetterWithDiacritics(char c)
{
    var s = c.ToString().Normalize(NormalizationForm.FormD);
    return (s.Length > 1)  &&
           char.IsLetter(s[0]) &&
           s.Skip(1).All(c2 => CharUnicodeInfo.GetUnicodeCategory(c2) == UnicodeCategory.NonSpacingMark);
}


来源:https://stackoverflow.com/questions/9349608/how-to-check-if-unicode-character-has-diacritics-in-net

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!