I have a MySQL database with words containing accents in Spanish (áéíóú). I\'d like to know if there\'s any way to do a diacritic insensitive search. For instance, if I sear
Store a second version of the string that has been stripped of diacritics?
You can force the column name to convert as UTF8. I haven't tried is for Spanish but rather for Romanian characters with accents, but I assume it's the same thing.
The query I use is:
SELECT CONVERT('gîgă' USING utf8) LIKE '%giga%'
Or in the more likely case of looking up a column in a table, you can use:
SELECT CONVERT(column_name USING utf8) FROM table_name LIKE '%giga%'
If you set the table's charset to UTF-8 and the collation to utf8_*_ci (_ci means "case insensitive) MySQL will perform case and accent-insensitive searches by default
Read more about charsets and collations here:
http://dev.mysql.com/doc/refman/5.1/en/charset-charsets.html
I tested it and
"lapiz" matches: "lápiz," "lapíz," and "lapiz"
"nino" matches: "niño," "ninó," and "nino"
You can set up the collation of your table upon creation:
CREATE TABLE table ( ... )
CHARACTER SET uft8 COLLATE utf8_general_ci;
Or you can ALTER
it if it already exists.For more info, read the manual (link above).
If you are using phpMyAdmin, you can select the collation when you create your table.
Character sets & collations, not my favorites, but they DO work:
mysql> SET NAMES latin1;
mysql> SELECT 'lápiz' LIKE 'lapiz';
+-----------------------+
| 'lápiz' LIKE 'lapiz' |
+-----------------------+
| 0 |
+-----------------------+
1 row in set (0.01 sec)
mysql> SET NAMES utf8;
mysql> SELECT 'lápiz' LIKE 'lapiz';
+-----------------------+
| 'lápiz' LIKE 'lapiz' |
+-----------------------+
| 1 |
+-----------------------+
mysql> SET NAMES latin1;
mysql> SELECT _utf8'lápiz' LIKE _utf8'lapiz' ;
+---------------------------------+
| _utf8'lápiz' LIKE _utf8'lapiz' |
+---------------------------------+
| 1 |
+---------------------------------+
A nice chapter to read in the manual:Character Set Support
Just in case someone else stumbles upon this issue, I have found a way that solves the problem, at least for me, without messing with character sets and collations inside MySQL queries.
I am using PHP to insert and retrieve records from the database. Even though my Database, tables and columns are utf8, as well as the encoding of the PHP files, the truth is that the encoding used in the connection between PHP and MySQL is being made using latin1. I managed to find this using $mysqli->character_set_name(); where $mysqli is your object.
For the searches to start working as expected, returning accent insensitive and case insentive records for characters with accents or not, I have to explicitly set the character set of the connection.
To do this, you just have to do the following: $mysqli->set_charset('utf8'); where $mysqli is your mysqli object. If you have a database management class that wraps your database functionality, this is easy to apply to a complete app. If not, you have to set this explicitly everywhere you open a connection.
I hope this helps someone out, as I was already freaking out about this!