How to detect malformed utf-8 string in PHP?

后端 未结 4 2231
时光取名叫无心
时光取名叫无心 2020-11-27 05:16

iconv function sometimes gives me an error:

Notice:
iconv() [function.iconv]:
Detected an incomplete multibyte character in input string in [...]
         


        
4条回答
  •  旧巷少年郎
    2020-11-27 05:43

    First, note that it is not possible to detect whether text belongs to a specific undesired encoding. You can only check whether a string is valid in a given encoding.

    You can make use of the UTF-8 validity check that is available in preg_match [PHP Manual] since PHP 4.3.5. It will return 0 (with no additional information) if an invalid string is given:

    $isUTF8 = preg_match('//u', $string);
    

    Another possibility is mb_check_encoding [PHP Manual]:

    $validUTF8 = mb_check_encoding($string, 'UTF-8');
    

    Another function you can use is mb_detect_encoding [PHP Manual]:

    $validUTF8 = ! (false === mb_detect_encoding($string, 'UTF-8', true));
    

    It's important to set the strict parameter to true.

    Additionally, iconv [PHP Manual] allows you to change/drop invalid sequences on the fly. (However, if iconv encounters such a sequence, it generates a notification; this behavior cannot be changed.)

    echo 'TRANSLIT : ', iconv("UTF-8", "ISO-8859-1//TRANSLIT", $string), PHP_EOL;
    echo 'IGNORE   : ', iconv("UTF-8", "ISO-8859-1//IGNORE", $string), PHP_EOL;
    

    You can use @ and check the length of the return string:

    strlen($string) === strlen(@iconv('UTF-8', 'UTF-8//IGNORE', $string));
    

    Check the examples on the iconv manual page as well.

    You have not shared the source code where the notice is resulting from. You should add it if you want a more concrete suggestion.

提交回复
热议问题