How to keep json_encode() from dropping strings with invalid characters

后端 未结 6 701
醉话见心
醉话见心 2020-11-29 06:24

Is there a way to keep json_encode() from returning null for a string that contains an invalid (non-UTF-8) character?

It can be a pain in t

6条回答
  •  渐次进展
    2020-11-29 06:59

    This function will remove all invalid UTF8 chars from a string:

    function removeInvalidChars( $text) {
        $regex = '/( [\x00-\x7F] | [\xC0-\xDF][\x80-\xBF] | [\xE0-\xEF][\x80-\xBF]{2} | [\xF0-\xF7][\x80-\xBF]{3} ) | ./x';
        return preg_replace($regex, '$1', $text);
    }
    

    I use it after converting an Excel document to json, as Excel docs aren't guaranteed to be in UTF8.

    I don't think there's a particularly sensible way of converting invalid chars to a visible but valid character. You could replace invalid chars with U+FFFD which is the unicode replacement character by turning the regex above around, but that really doesn't provide a better user experience than just dropping invalid chars.

提交回复
热议问题