I\'m having some issues using the following code on user input:
htmlentities($string, ENT_COMPAT, \'UTF-8\');
When an invalid multibyte cha
How can I remove invalid multibyte characters, efficiently, securely, without notices/warnings/errors?
Well, as you already have outlined in your question on your own (or at least linked), deleting the invalid byte sequence(s) is not an option.
Instead it should be probably replaced with the replacement character U+FFFD. As of PHP 5.4.0 you can make use of the ENT_SUBSTITUTE
flag for htmlentities. That's probably most safe if you don't want to reject the string.
iconv
will always give you warning in recent PHP versions if not even deleting the whole string. So it does not look like a good alternative for you.
iconv('UTF-8', "ISO-8859-1//IGNORE", $string);
worked extremely well for me. Doesn't seem to generate any notice.