Collation STRENGTH and local language relation

时光怂恿深爱的人放手 提交于 2019-12-23 07:28:25

问题


I have read the following from Collator's Javadoc.

"The exact assignment of strengths to language features is locale dependant. For example, in Czech, "e" and "f" are considered primary differences, while "e" and "ê" are secondary differences, "e" and "E" are tertiary differences and "e" and "e" are identical."

Does this mean that I should set the STRENGTH based on the language I am using? If so can someone suggest the defaults for the locales: us_en, us_es, ca_fr, spain_spanish, chile_spanish, portuguese


回答1:


It really depends on what you're trying to do. The following is true for most (all?) languages that use the Latin alphabet:

  • Primary
    • Different: a, á, Á, b
    • Same: á, â
    • Same: a, A
  • Secondary
    • Different: a, á, Á, b
    • Different: á, â
    • Same: a, A
  • Tertiary
    • Different: a, á, Á, b
    • Different: á, â
    • Different: a, A
  • Identical
    • Also consider differences you can't see, for example between (accented A) and (A) + (accent)

There will be slight variations between languages, but in essence:

  • If you want case-sensitive comparison, use Tertiary.
  • For case-insensitive comparison, use either Primary or Secondary depending on whether you want á to be grouped with â.
  • Some of the collation rules are quite strange. a is different from á even in Primary, and á is different from Á even in Primary/Secondary. I don't know why; bug, maybe?
  • Who knows what happens in non-Latin languages.


来源:https://stackoverflow.com/questions/3739989/collation-strength-and-local-language-relation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!