How to replace/remove 4(+)-byte characters from a UTF-8 string in PHP?

后端 未结 7 2127
小蘑菇
小蘑菇 2020-12-14 01:00

It seems like MySQL does not support characters with more than 3 bytes in its default UTF-8 charset.

So, in PHP, how can I get rid of all 4(-and-more)-byte character

7条回答
  •  死守一世寂寞
    2020-12-14 01:25

    Here is my implementation to filter out 4-byte chars

    $string = preg_replace_callback(
        '/./u',
        function (array $match) {
            return strlen($match[0]) >= 4 ? null : $match[0];
        },
        $string
    );
    

    you could tweak it and replace null (which removes the char) with some substitute string. You can also replace >= 4 with some other byte-length check.

提交回复
热议问题