Detect encoding and make everything UTF-8

前端 未结 24 2947
暗喜
暗喜 2020-11-22 03:03

I\'m reading out lots of texts from various RSS feeds and inserting them into my database.

Of course, there are several different character encodings used in the fee

24条回答
  •  生来不讨喜
    2020-11-22 03:07

    This cheatsheet lists some common caveats related to UTF-8 handling in PHP: http://developer.loftdigital.com/blog/php-utf-8-cheatsheet

    This function detecting multibyte characters in a string might also prove helpful (source):

    
    function detectUTF8($string)
    {
        return preg_match('%(?:
            [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
            |\xE0[\xA0-\xBF][\x80-\xBF]        # excluding overlongs
            |[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte
            |\xED[\x80-\x9F][\x80-\xBF]        # excluding surrogates
            |\xF0[\x90-\xBF][\x80-\xBF]{2}     # planes 1-3
            |[\xF1-\xF3][\x80-\xBF]{3}         # planes 4-15
            |\xF4[\x80-\x8F][\x80-\xBF]{2}     # plane 16
            )+%xs', 
        $string);
    }
    

提交回复
热议问题