MySQL WHERE `character` = 'a' is matching a, A, Ã, etc. Why?

天涯浪子 提交于 2019-12-04 17:54:45

As documented under Unicode Character Sets:

MySQL implements the xxx_unicode_ci collations according to the Unicode Collation Algorithm (UCA) described at http://www.unicode.org/reports/tr10/. The collation uses the version-4.0.0 UCA weight keys: http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt.

The full collation chart makes clear that, in this collation, most variations of a base letter are equivalent irrespective of their lettercase or accent/decoration.

If you want to only match exact letters, you should use a binary collation such as utf8_bin.

The collation of the table is part of the issue; MySQL with a _ci collation is treating all of those 'a's as variants of the same character.

Switching to a _cs collation will force the engine to distinguish 'a' from 'A', and 'á' from 'Á', but it may still treat 'a' and 'á' as the same character.

If you need exact comparison semantics, completely disregarding the equivalency of similar characters, you can use the BINARY comparison operators

SELECT id FROM unicode WHERE BINARY character = 'a'

The ci in the collation means case-insensitive. Switch to a case-sensitive collation (cs) to get the results you're looking for.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!