Removing invalid/incomplete multibyte characters

前端 未结 2 489
遥遥无期
遥遥无期 2020-12-31 17:40

I\'m having some issues using the following code on user input:

htmlentities($string, ENT_COMPAT, \'UTF-8\');

When an invalid multibyte cha

相关标签:
2条回答
  • 2020-12-31 18:08

    How can I remove invalid multibyte characters, efficiently, securely, without notices/warnings/errors?

    Well, as you already have outlined in your question on your own (or at least linked), deleting the invalid byte sequence(s) is not an option.

    Instead it should be probably replaced with the replacement character U+FFFD. As of PHP 5.4.0 you can make use of the ENT_SUBSTITUTE flag for htmlentities. That's probably most safe if you don't want to reject the string.

    iconv will always give you warning in recent PHP versions if not even deleting the whole string. So it does not look like a good alternative for you.

    0 讨论(0)
  • 2020-12-31 18:22

    iconv('UTF-8', "ISO-8859-1//IGNORE", $string);

    worked extremely well for me. Doesn't seem to generate any notice.

    0 讨论(0)
提交回复
热议问题