How to replace/remove 4(+)-byte characters from a UTF-8 string in PHP?

后端 未结 7 2124
小蘑菇
小蘑菇 2020-12-14 01:00

It seems like MySQL does not support characters with more than 3 bytes in its default UTF-8 charset.

So, in PHP, how can I get rid of all 4(-and-more)-byte character

7条回答
  •  再見小時候
    2020-12-14 01:26

    Came across this question when trying to solve my own issue (Facebook spits out certain emoticons as 4-byte characters, Amazon Mechanical Turk does not accept 4-byte characters).

    I ended up using this, doesn't require mbstring extension:

    function remove_4_byte($string) {
        $char_array = preg_split('/(?3) {
                $char_array[$x] = "";
            }
        }
        return implode($char_array, "");
    }
    

提交回复
热议问题