PHP: Replace umlauts with closest 7-bit ASCII equivalent in an UTF-8 string

筅森魡賤 提交于 2019-11-26 03:21:35

问题


What I want to do is to remove all accents and umlauts from a string, turning \"lärm\" into \"larm\" or \"andré\" into \"andre\". What I tried to do was to utf8_decode the string and then use strtr on it, but since my source file is saved as UTF-8 file, I can\'t enter the ISO-8859-15 characters for all umlauts - the editor inserts the UTF-8 characters.

Obviously a solution for this would be to have an include that\'s an ISO-8859-15 file, but there must be a better way than to have another required include?

echo strtr(utf8_decode($input), 
           \'ŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿ\',
           \'SOZsozYYuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiionoooooouuuuyy\');

UPDATE: Maybe I was a bit inaccurate with what I try to do: I do not actually want to remove the umlauts, but to replace them with their closest \"one character ASCII\" equivalent.


回答1:


iconv("utf-8","ascii//TRANSLIT",$input);

Extended example




回答2:


A little trick that doesn't require setting locales or having huge translation tables:

function Unaccent($string)
{
    if (strpos($string = htmlentities($string, ENT_QUOTES, 'UTF-8'), '&') !== false)
    {
        $string = html_entity_decode(preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|tilde|uml);~i', '$1', $string), ENT_QUOTES, 'UTF-8');
    }

    return $string;
}

The only requirement for it to work properly is to save your files in UTF-8 (as you should already).




回答3:


you can also try this

$string = "Fóø Bår";
$transliterator = Transliterator::createFromRules(':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;', Transliterator::FORWARD);
echo $normalized = $transliterator->transliterate($string);

but you need to have http://php.net/manual/en/book.intl.php available




回答4:


Okay, found an obvious solution myself, but it's not the best concerning performance...

echo strtr(utf8_decode($input), 
           utf8_decode('ŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿ'),
           'SOZsozYYuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiionoooooouuuuyy');



回答5:


For Arabic and Persian users i recommend this way to remove diacritics:

    $diacritics = array('َ','ِ','ً','ٌ','ٍ','ّ','ْ','ـ');
    $search_txt = str_replace($diacritics, '', $diacritics);

For typing diacritics in Arabic keyboards u can use this Asci(those codes are Asci not Unicode) codes in windows editors typing diacritics directly or holding Alt + (type the code of diacritic character) This is the codes

ـَ(0243) ـِ(0246) ـُ(0245) ـً(0240) ـٍ(0242) ـٌ(0241) ـْ(0250) ـّ(0248) ـ ـ(0220)




回答6:


I found that this one gives the most consistent results in French and German. with the meta tag set to utf-8, I have place it in a function to return a line from a array of words and it works perfect.

htmlentities (  $line, ENT_SUBSTITUTE   , 'utf-8' ) 



回答7:


If you are using WordPress, you can use the built-in function remove_accents( $string )

https://codex.wordpress.org/Function_Reference/remove_accents

However I noticed a bug : it doesn’t work on a string with a single character.



来源:https://stackoverflow.com/questions/158241/php-replace-umlauts-with-closest-7-bit-ascii-equivalent-in-an-utf-8-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!