Issues replacing special characters in PHP string

青春壹個敷衍的年華 提交于 2019-12-06 08:37:28

I copied and pasted your code into my editor and something interesting happened. Instead of getting adios I was getting adjiós. Notice the j in the middle after the d. This was coming from the 'đ'=>'dj', in the first line of the table map. Apparently, my editor changed the đ to a regular d, and then it wouldn't convert the ó. I removed this key/value pair and suddenly it worked for me. Are you sure all of your keys are correct in your editor (Does you editor accept alternative character sets?) Here is my test file (with the đ removed:

<html>
<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
</head>
<body>
<?php

function normalize ($string) {
    $table = array(
        'Š'=>'S', 'š'=>'s', 'Ð'=>'Dj', 'Ž'=>'Z', 'ž'=>'z', 'C'=>'C', 'c'=>'c', 'C'=>'C', 'c'=>'c',
        'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
        'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O',
        'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss',
        'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e',
        'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o',
        'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b',
        'ÿ'=>'y', 'R'=>'R', 'r'=>'r',
    );

    return strtr($string, $table);
}

$word = 'adiós';
$length = strlen($word);

echo 'original: '. $word;
echo '<br />';
echo 'normalized: '. normalize($word); 
echo '<br />';
echo 'loop: ';

for($i = 0; $i < $length; $i++) {
    echo normalize($word[$i]);
}

?>

</body>
</html>

When I loop through each character with the 'd' => 'dj' in the array map then I correctly get adjios

chichilatte

There's a great tip from this page: How to remove diacritics from text? Here's my version of it:

/** Normalize a string so that it can be compared with others without being too fussy.
*   e.g. "Ádrèñålînë" would return "adrenaline"
*   Note: Some letters are converted into more than one letter, 
*   e.g. "ß" becomes "sz", or "æ" becomes "ae"
*/
function normalize_string($string) {
    // remove whitespace, leaving only a single space between words. 
    $string = preg_replace('/\s+/', ' ', $string);
    // flick diacritics off of their letters
    $string = preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml|caron);~i', '$1', htmlentities($string, ENT_COMPAT, 'UTF-8'));  
    // lower case
    $string = strtolower($string);
    return $string;
}

It's good because, unlike the iconv method mentioned above, there's no converting between character sets (they're a minefield).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!