mySQL - matching latin (english) form input to utf8 (non-English) data

孤者浪人 提交于 2019-12-01 12:54:09

A possible solution would be creating another column in the database next to "artist", like "artist_normalized". Here, while populating the table, you could insert a "normalized" version of the string. Search can then be performed against the artist_normalized column.

A test code:

<?php
$transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;', Transliterator::FORWARD);
$test = ['abcd', 'èe', '€', 'àòùìéëü', 'àòùìéëü', 'tiësto'];
foreach($test as $e) {
    $normalized = $transliterator->transliterate($e);
    echo $e. ' --> '.$normalized."\n";
}
?>

Result:

abcd --> abcd
èe --> ee
€ --> €
àòùìéëü --> aouieeu
àòùìéëü --> aouieeu
tiësto --> tiesto

The magic is done by the Transliterator class. The specified rule performs three actions: decomposes the string, removes diacritics and then recomposes the string, canonicalized. Transliterator in PHP is built on top of ICU, so by doing this you're relying on the tables of the ICU library, which are complete and reliable.

Note: this solution requires PHP 5.4 or greater with the intl extension.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!