How to handle user input of invalid UTF-8 characters?

后端 未结 9 2023
小鲜肉
小鲜肉 2020-11-29 17:26

I\'m looking for general a strategy/advice on how to handle invalid UTF-8 input from users.

Even though my webapp uses UTF-8, somehow some users enter invalid chara

9条回答
  •  清酒与你
    2020-11-29 18:17

    I put together a fairly simple class to check if input is in UTF-8 and to run through utf8_encode() as needs be:

    class utf8
    {
    
        /**
         * @param array $data
         * @param int $options
         * @return array
         */
        public static function encode(array $data)
        {
            foreach ($data as $key=>$val) {
                if (is_array($val)) {
                    $data[$key] = self::encode($val, $options);
                } else {
                    if (false === self::check($val)) {
                        $data[$key] = utf8_encode($val);
                    }
                }
            }
    
            return $data;
        }
    
        /**
         * Regular expression to test a string is UTF8 encoded
         * 
         * RFC3629
         * 
         * @param string $string The string to be tested
         * @return bool
         * 
         * @link http://www.w3.org/International/questions/qa-forms-utf-8.en.php
         */
        public static function check($string)
        {
            return preg_match('%^(?:
                [\x09\x0A\x0D\x20-\x7E]              # ASCII
                | [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
                |  \xE0[\xA0-\xBF][\x80-\xBF]        # excluding overlongs
                | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
                |  \xED[\x80-\x9F][\x80-\xBF]        # excluding surrogates
                |  \xF0[\x90-\xBF][\x80-\xBF]{2}     # planes 1-3
                | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
                |  \xF4[\x80-\x8F][\x80-\xBF]{2}     # plane 16
                )*$%xs',
                $string);
        }
    }
    
    // For example
    $data = utf8::encode($_POST);
    

提交回复
热议问题