问题
I am developing a heuristic for automatic language detection and would like to find out whether the given letter has diacritics (like "Ðàäèî Êóëüòóðà" -- all letters have diacritics). It would be best if I could also get the type of diacritic, if possible.
I browsed through UnicodeCategory
enum but didn't find anything that could help me here.
回答1:
One possible way is to normalize it to a form where letters and their diacritics are written as several codepoints. Then check if you have a letter followed by accents.
Adapting from How do I remove diacritics (accents) from a string in .NET?, you can normalize with Normalize(NormalizationForm.FormD)
and check for the diacritics with UnicodeCategory.NonSpacingMark
.
bool IsLetterWithDiacritics(char c)
{
var s = c.ToString().Normalize(NormalizationForm.FormD);
return (s.Length > 1) &&
char.IsLetter(s[0]) &&
s.Skip(1).All(c2 => CharUnicodeInfo.GetUnicodeCategory(c2) == UnicodeCategory.NonSpacingMark);
}
来源:https://stackoverflow.com/questions/9349608/how-to-check-if-unicode-character-has-diacritics-in-net