I am trying to replace accented characters with the normal replacements. Below is what I am currently doing.
$string = \"Éric Cantona\";
$strict = st
In PHP 5.4 the intl extension provides a new class named Transliterator.
I believe that's the best way to remove diacritics for two reasons:
Transliterator is based on ICU, so you're using the tables of the ICU library. ICU is a great project, developed over the year to provide comprehensive tables and functionalities. Whatever table you want to write yourself, it will never be as complete as the one from ICU.
In UTF-8, characters could be represented differently. For example, the character ñ could be saved as a single (multi-byte) character, or as the combination of characters ˜ (multibyte) and n. In addition to this, some characters in Unicode are homograph: they look the same while having different codepoints. For this reason it's also important to normalize the string.
Here's a sample code, taken from an old answer of mine:
transliterate($e);
echo $e. ' --> '.$normalized."\n";
}
?>
Result:
abcd --> abcd
èe --> ee
€ --> €
àòùìéëü --> aouieeu
àòùìéëü --> aouieeu
tiësto --> tiesto
The first argument for the Transliterator class performs the removal of diacritics as well as the normalization of the string.