Im having a problem with removing non-utf8 characters from string, which are not displaying properly. Characters are like this 0x97 0x61 0x6C 0x6F (hex representation)
I have made a function that deletes invalid UTF-8 characters from a string. I'm using it to clear description of 27000 products before it generates the XML export file.
public function stripInvalidXml($value) {
$ret = "";
$current;
if (empty($value)) {
return $ret;
}
$length = strlen($value);
for ($i=0; $i < $length; $i++) {
$current = ord($value{$i});
if (($current == 0x9) || ($current == 0xA) || ($current == 0xD) || (($current >= 0x20) && ($current <= 0xD7FF)) || (($current >= 0xE000) && ($current <= 0xFFFD)) || (($current >= 0x10000) && ($current <= 0x10FFFF))) {
$ret .= chr($current);
}
else {
$ret .= "";
}
}
return $ret;
}