PHP Utf8 Decoding Issue

前端 未结 4 1720
忘掉有多难
忘掉有多难 2020-12-18 02:58

I have the following address line: Praha 5, Staré Město,

I need to use utf8_decode() function on this string before I can write it to a PDF file (us

相关标签:
4条回答
  • 2020-12-18 03:34

    I wound up using a home-grown UTF-8 / UTF-16 decoding function (convert to &#number; representations), I haven't found any pattern to why UTF-8 isn't detected, I suspect it's because the "encoded-as" sequence isn't always exactly in the same position in the string returned. You might do some additional checking on that.

    Three-character UTF-8 indicator: $startutf8 = chr(0xEF).chr(187).chr(191); (if you see this ANYWHERE, not just first three characters, the string is UTF-8 encoded)

    Decode according to UTF-8 rules; this replaced an earlier version which chugged through byte by byte:using

    function charset_decode_utf_8 ($string) {
    /* Only do the slow convert if there are 8-bit characters */
    /* avoid using 0xA0 (\240) in ereg ranges. RH73 does not like that */
    if (! ereg("[\200-\237]", $string) and ! ereg("[\241-\377]", $string))
        return $string;
    
    // decode three byte unicode characters
    $string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e",       
    "'&#'.((ord('\\1')-224)*4096 + (ord('\\2')-128)*64 + (ord('\\3')-128)).';'",   
    $string);
    
    // decode two byte unicode characters
    $string = preg_replace("/([\300-\337])([\200-\277])/e",
    "'&#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';'",
    $string);
    
    return $string;
    }
    
    0 讨论(0)
  • 2020-12-18 03:37

    you don't need that (@Rajeev :this string is automatically detected as utf-8 encoded :

    echo mb_detect_encoding('Praha 5, Staré Město,');
    

    will always return UTF-8.).

    You'd rather see : https://code.google.com/p/dompdf/wiki/CPDFUnicode

    0 讨论(0)
  • 2020-12-18 03:44

    Problem is in your PHP file encoding , save your file in UTF-8 encoding , then even no need to use utf8_decode , if you get these data 'Praha 5, Staré Město,' from database , better change it charset to UTF-8

    0 讨论(0)
  • 2020-12-18 03:51

    utf8_decode converts the string from a UTF-8 encoding to ISO-8859-1, a.k.a. "Latin-1".
    The Latin-1 encoding cannot represent the letter "ě". It's that simple.
    "Decode" is a total misnomer, it does the same as iconv('UTF-8', 'ISO-8859-1', $string).

    See What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text.

    0 讨论(0)
提交回复
热议问题