Which of utf8 collations is the best?

十年热恋 提交于 2019-12-31 17:52:28

问题


I want a UTF8 collation for supporting:

  • English
  • Persian
  • Arabic
  • French
  • Japanese
  • Chinese

Does UTF8_GENERAL_CI support all these Languages?


回答1:


Yes, that is correct. UTF-8 is an encoding for the Unicode character set, which supports pretty much every language in the world.

I think the only difference comes with sorting your results, different letters might come in a different order in other languages (accents, umlauts, etc.). Also, comparing a to ä might behave differently in another collation.

The _ci suffix means sorting and comparison happens case insensitive.

http://www.collation-charts.org/ might be of interest to you.




回答2:


As UTF8_GENERAL_CI was a good decision some time ago. It has some drawbacks now.

MySQL's UTF8 actually uses 3 bytes instead of 4, which you need for symbols like emojis and new asian chars.

So MySQL has a newer charset called utf8mb4 which actually complies with UTF8 definition.

To be able fully support Asian languages you will need to choose utf8mb4.

If you care about correct sorting in multiple languages, use utf8mb4_unicode or utf8mb4_unicode_ci instead general.

A more detailed answer you can find in What's the difference between utf8_general_ci and utf8_unicode_ci



来源:https://stackoverflow.com/questions/2703578/which-of-utf8-collations-is-the-best

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!