发表新帖

发表新帖

How to know if a string contains accents

后端未结

关注

 3  1972

温柔的废话 2021-01-18 00:09

How to know if a string contains accents?

3条回答

长发绾君心 (楼主)

2021-01-18 00:25

The right way to do this is to use normalize(str,NFD) fromjava.text.Normalizer, and then delete the characters of general category Mark \pM or Non-Spacing Mark \p{Mn}. Java does not support the standard Unicode property \p{Diacritic} or you could use that. Note that not all Diacritics are Non-Spacing Marks, nor vice versa.

However, this is probably the wrong thing to do. If you are trying to do accent-insensitive string searches and comparisons, the right way to do that is to leave the strings as they are. You need to create a UCA collation object with the level set to 1 (or rather, PRIMARY), then use that to compare your strings. If strings compare equal at the primary strength, it disregards things like accent marks.

Here are examples in Java of how to do that using ICU’s Collator class. If you’re using proper UCA collators, then you don’t have to normalize; they take care of this for you.

This answer in Perl uses two UCA collator objects, one at the primary strength to completely ignore accents for string searches and comparisons, and another that allows diacritics to be distinguished at the secondary strength as is normal for Unicode.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题