choosing table collation for universal characters

烈酒焚心 提交于 2019-12-24 08:47:43

问题


I'm working on a backend that needs to store universal characters.

I've chosen utf8mb4 Table Encoding for that purpose. I also have to choose Table Collation.

The most straightforward option is to choose utf8mb4_general_ci Table collation. Besides the general one, there is also about 20 others collations to choose from.. What is the purpose of the more specific ones? Does utf8mb4_general_ci or maybe utf8mb4_unicode520_ci cover all of them? Which one should I use if I want to store characters ranging from chinese all the way to arab.


回答1:


  • ...general_ci is simple. It does not equate 2-character combinations (such as with a non-spacing mark) with the single-character equivalent.

  • ...unicode_520_ci comes from Unicode version 5.20, the latest version available when MySQL picked up on it. It handles things like having an ordering for Emoji, which previous versions did not have.

  • With MySQL 8.0, the preferred collation is utf8mb4_0900_ai_ci, based on Unicode 9.0.

  • ...<language>_ci handles variations found in the given language. For example, should ch and ll in Spanish be treated as "letters" and sort between cz and d, and lz and m.

  • For general use, do not use ...general_ci, use the latest version derived from Unicode. For language-specific situations, pick one of the other collations.

  • I do know know how (or even whether) Chinese and Arabic are sorted differently in the different collations. However, I see ...persion_ci, so I suspect there is an issue.

  • Do use utf8mb4, not utf8, especially since you need Chinese.



来源:https://stackoverflow.com/questions/50249261/choosing-table-collation-for-universal-characters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!