php iconv translit for removing accents: not working as excepted?

后端 未结 7 1260
终归单人心
终归单人心 2020-12-10 06:09

consider this simple code:

echo iconv(\'UTF-8\', \'ASCII//TRANSLIT\', \'è\');

it prints

 `e

instead of ju

7条回答
  •  温柔的废话
    2020-12-10 07:02

    cf @tchrist, with INTL php extension

    http://fr2.php.net/manual/en/book.intl.php

    preg_replace('/\pM*/u','',normalizer_normalize( $mystring, Normalizer::FORM_D));
    

    eéèêëiîïoöôuùûüaâäÅ Ἥ ŐǟǠ ǺƶƈƉųŪŧȬƀ␢ĦŁȽŦ ƀǖ becomes

    eeeeeiiiooouuuuaaaA Η OaA AƶƈƉuUŧOƀ␢ĦŁȽŦ ƀu


    As tchrist emphasises, not all unicode characters are considered decomposable:

    extract from Unicode charts:

    U0080.pdf

    00CF Ï LATIN CAPITAL LETTER I WITH DIAERESIS

    ≡ 0049 I 0308 ¨

    NB this symbol « ≡ » indicate an available decomposition

    00D0 Ð LATIN CAPITAL LETTER ETH

    → 00F0 ð latin small letter eth

    → 0110 Đ latin capital letter d with stroke

    → 0189 Ɖ latin capital letter african d

    no decomposition available, IMHO strangely (we could consider ASCII letter D as an acceptable equivalent).

    U0100.pdf

    0110 Đ LATIN CAPITAL LETTER D WITH STROKE

    → 00D0 Ð latin capital letter eth

    → 0111 đ latin small letter d with stroke

    → 0189 Ɖ latin capital letter african d

    even stranger: this one is identified as LATIN CAPITAL LETTER D (with stroke), but not decomposable as such! Perhaps a cooler solution should be to get the unicode description of each char, and compare it with the description of each ascii char (and replace accordingly). Anyone? ;-]

    cf http://unicode.org/Public/UNIDATA/UnicodeData.txt

提交回复
热议问题