Remove non-utf8 characters from string

后端 未结 18 1745
心在旅途
心在旅途 2020-11-22 11:56

Im having a problem with removing non-utf8 characters from string, which are not displaying properly. Characters are like this 0x97 0x61 0x6C 0x6F (hex representation)

18条回答
  •  误落风尘
    2020-11-22 12:31

    I have made a function that deletes invalid UTF-8 characters from a string. I'm using it to clear description of 27000 products before it generates the XML export file.

    public function stripInvalidXml($value) {
        $ret = "";
        $current;
        if (empty($value)) {
            return $ret;
        }
        $length = strlen($value);
        for ($i=0; $i < $length; $i++) {
            $current = ord($value{$i});
            if (($current == 0x9) || ($current == 0xA) || ($current == 0xD) || (($current >= 0x20) && ($current <= 0xD7FF)) || (($current >= 0xE000) && ($current <= 0xFFFD)) || (($current >= 0x10000) && ($current <= 0x10FFFF))) {
                    $ret .= chr($current);
            }
            else {
                $ret .= "";
            }
        }
        return $ret;
    }
    

提交回复
热议问题