UTF8 string comparisons in MySQL

大兔子大兔子 提交于 2019-12-19 19:13:48

问题


We have issues with utf8-string comparisons in MySQL 5, regarding case and accents :

from what I gathered, what MySQL implements collations by considering that "groups of characters should be considered equal".

For example, in the utf8_unicode_ci collation, all the letters "EÉÈÊeéèê" are in the same box (together with other variants of "e").

So if you have a table containing ["video", "vidéo", "vidÉo", "vidÊo", "vidêo", "vidÈo", "vidèo", "vidEo"] (in a varchar column declared with ut8_general_ci collation) :

  • when asking MySQL to sort the rows according to this column, the sorting is random (MySQL does not enforce a sorting rule between "é" and "É" for example),
  • when asking MySQL to add a Unique Key on this column, it raises an error because it considers all the values are equal.

What setting can we fiddle with to fix these two points ?

PS : on a related note, I do not see any case sensitive collation for the utf8 charset. Did I miss something ?


[edit] I think my initial question still holds some interest, and I will leave it as is (and maybe one day get a positive answer).

It turned out, however, that our problems with string comparisons regarding accents was not linked to the collation of our text columns. It was linked to a configuration problem with the character_set_client parameter when talking with MySQL - which defaulted to latin1.

Here is the article that explained it all to us, and allowed us to fix the problem :

Getting out of MySQL character set hell

It is lengthy, but trust me, you need this length to explain both the problem and the fix.


回答1:


Use collation that considers these characters to be distinct. Maybe utf8_bin (it's case sensitive, since it does binary comparison of characters)

http://dev.mysql.com/doc/refman/5.7/en/charset-unicode-sets.html



来源:https://stackoverflow.com/questions/7023541/utf8-string-comparisons-in-mysql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!