INSTR(str,substr) does not work when str contains 'é' or 'ë' and substr only 'e'

让人想犯罪 __ 提交于 2019-12-10 16:55:13

问题


In another post on stackoverflow, I read that INSTR could be used to order results by relevance.

My understanding of col LIKE '%str%' andINSTR(col, 'str')` is that they both behave the same. There seems to be a difference in how collations are handled.

CREATE TABLE `users` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

INSERT INTO users (name)
VALUES ('Joël'), ('René');

SELECT * FROM users WHERE name LIKE '%joel%'; -- 1 record returned
SELECT * FROM users WHERE name LIKE '%rene%'; -- 1 record returned
SELECT * FROM users WHERE INSTR(name, 'joel') > 0; -- 0 records returned
SELECT * FROM users WHERE INSTR(name, 'rene') > 0; -- 0 records returned
SELECT * FROM users WHERE INSTR(name, 'joël') > 0; -- 1 record returned
SELECT * FROM users WHERE INSTR(name, 'rené') > 0; -- 1 record returned

Although INSTR does some conversion, it finds ë in é.

SELECT INSTR('é', 'ë'), INSTR('é', 'e'), INSTR('e', 'ë');
-- returns 1, 0, 0

Am I missing something?

http://sqlfiddle.com/#!2/9bf21/6 (using mysql-version: 5.5.22)


回答1:


This is due to bug 70767 on LOCATE() and INSTR(), which has been verified.

Though the INSTR() documentation states that it can be used for multi-byte strings, it doesn't seem to work, as you note, with collations like utf8_general_ci, which should be case and accent insensitive

This function is multi-byte safe, and is case sensitive only if at least one argument is a binary string.

The bug report states that although MySQL does this correctly it only does so when the number of bytes is also identical:

However, you can easily observe that they do not (completely) respect collations when looking for one string inside another one. It seems that what's happening is that MySQL looks for a substring which is collation-equal to the target which has exactly the same length in bytes as the target. This is only rarely true.

To pervert the reports example, if you create the following table:

create table t ( needle varchar(10), haystack varchar(10)
                  ) COLLATE=utf8_general_ci;
insert into t values ("A", "a"), ("A", "XaX");
insert into t values ("A", "á"), ("A", "XáX");
insert into t values ("Á", "a"), ("Á", "XaX");
insert into t values ("Å", "á"), ("Å", "XáX");

then run this query, you can see the same behaviour demonstrated:

select needle
     , haystack
     , needle=haystack as `=`
     , haystack LIKE CONCAT('%',needle,'%') as `like`
     , instr(needle, haystack) as `instr`
  from t;

SQL Fiddle



来源:https://stackoverflow.com/questions/20923186/instrstr-substr-does-not-work-when-str-contains-%c3%a9-or-%c3%ab-and-substr-only-e

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!