Java Collator with similar characteristic as MySQLs utf8_general_ci collation

断了今生、忘了曾经 提交于 2019-12-12 18:14:04

问题


Is there any Collator implementation which has the same characteristics as MySQL's utf8_general_ci? I need a collator which is case insensitive and does not distinguish german umlauts like ä with the vowel a.

Background: We recently encountered a bug which was caused by a wrong collation in our table. The used collation was utf8_general_ci where utf8_bin would be the correct one. The particular column had a unique index. The utf8_general_ci collation does not distinguish between words like pöker and poker, so the rows were merged, which was not desired. I now need a way to implement a module for our Java application, which repairs the wrong rows.


回答1:


You could use the following collator:

Collator collator = Collator.getInstance();
collator.setStrength(Collator.PRIMARY);

A collator with this strength will only consider primary differences significant during comparison.

Consider an example:

System.out.println(compare("abc", "ÀBC", Collator.PRIMARY)); //base char
System.out.println(compare("abc", "ÀBC", Collator.SECONDARY)); //base char + accent
System.out.println(compare("abc", "ÀBC", Collator.TERTIARY)); //base char + accent + case
System.out.println(compare("abc", "ÀBC", Collator.IDENTICAL)); //base char + accent + case + bits

private static int compare(String first, String second, int strength) {
   Collator collator = Collator.getInstance();
   collator.setStrength(strength);
   return collator.compare(first, second);
}

The output is:

0
-1
-1
-1

Have a look at these links for more information:

http://www.javapractices.com/topic/TopicAction.do?Id=207 https://docs.oracle.com/javase/7/docs/api/java/text/Collator.html#PRIMARY



来源:https://stackoverflow.com/questions/36151582/java-collator-with-similar-characteristic-as-mysqls-utf8-general-ci-collation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!