mysql utf8mb4_unicode_ci cause unique key collision

☆樱花仙子☆ 提交于 2019-12-23 14:59:08

问题


I have a table like this

CREATE TABLE `mb1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(30) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `u_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

And I insert tow rows

insert into mb1(name) values('K'),('K');

Note, the second K is unicode character

+------+-----------+
| name | hex(name) |
+------+-----------+
| K    | 4B        |
| K   | EFBCAB    |
+------+-----------+

Why do they cause unique key collision? Aren't they different character in utf8mb4?

After remove COLLATE utf8mb4_unicode_ci, the problem disappeared.


回答1:


Why do they cause unique key collision? Aren't they different character in utf8mb4?

You are missing the point about CHARACTER SET and COLLATION. A CHARACTER SET is a collection of different characters. A COLLATION says whether to treat the characters as equal -- think A and a -- different characters, but treated for ORDER BY and WHERE =, etc as being the same.

mysql> SELECT 'K'='K' COLLATE utf8_unicode_ci;
+-----------------------------------+
| 'K'='K' COLLATE utf8_unicode_ci  |
+-----------------------------------+
|                                 1 |
+-----------------------------------+

So in utf8_unicode_ci (or utf8mb4_unicode_ci), those two characters are considered to be "equal".

"Equal" is the test for UNIQUE keys.

Set the COLLATION for the column to whatever makes sense for you.

  • utf8mb4_unicode_ci for good 'real life' comparisons, apparently including this one. K=k=Ķ=ķ
  • utf8mb4_unicode_ci for more simple-minded comparisons. In particular no 2-character combinations match 1-character encodings. Case folding and accent stripping does occur. K=k=Ķ=ķ
  • utf8mb4_bin blindly checks the bits. No case folding, etc. K k Ķ ķ are all unequal.

utf8mb4_latvian_ci is a little different: K=k but not equal to Ķ=ķ . There are other specialized collations for other languages (mostly Western European).

Your is called "FULLWIDTH LATIN CAPITAL LETTER K", so it is quite reasonable that it compare equal to Latin K.



来源:https://stackoverflow.com/questions/31867895/mysql-utf8mb4-unicode-ci-cause-unique-key-collision

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!