How to replace/remove 4(+)-byte characters from a UTF-8 string in PHP?

后端 未结 7 2152
小蘑菇
小蘑菇 2020-12-14 01:00

It seems like MySQL does not support characters with more than 3 bytes in its default UTF-8 charset.

So, in PHP, how can I get rid of all 4(-and-more)-byte character

7条回答
  •  暗喜
    暗喜 (楼主)
    2020-12-14 01:37

    NOTE: you should not just strip, but replace with replacement character U+FFFD to avoid unicode attacks, mostly XSS:

    http://unicode.org/reports/tr36/#Deletion_of_Noncharacters

    preg_replace('/[\x{10000}-\x{10FFFF}]/u', "\xEF\xBF\xBD", $value);
    

提交回复
热议问题