Is there a way to keep json_encode()
from returning null
for a string that contains an invalid (non-UTF-8) character?
It can be a pain in t
This function will remove all invalid UTF8 chars from a string:
function removeInvalidChars( $text) {
$regex = '/( [\x00-\x7F] | [\xC0-\xDF][\x80-\xBF] | [\xE0-\xEF][\x80-\xBF]{2} | [\xF0-\xF7][\x80-\xBF]{3} ) | ./x';
return preg_replace($regex, '$1', $text);
}
I use it after converting an Excel document to json, as Excel docs aren't guaranteed to be in UTF8.
I don't think there's a particularly sensible way of converting invalid chars to a visible but valid character. You could replace invalid chars with U+FFFD which is the unicode replacement character by turning the regex above around, but that really doesn't provide a better user experience than just dropping invalid chars.