In PHP, we can use mb_check_encoding() to determine if a string is valid UTF-8. But that\'s not a portable solution as it requires the mbstring extension to be compiled in a
I put this in here for completeness:
Assuming PHP is compiled with PCRE it most often is also enabled with UTF-8. So as explicitly asked for in the question this very simple regular expression can detect invalid UTF-8 strings, because those won't match:
preg_match('//u', $string);
You can then argument that the u
modifier (PCRE_UTF8) is not always available, and true, this can happen as the this question shows:
However in my practical developer live this never was an issue. It is more an issue that the PCRE extension is not available at all, which would render any answer containing pcre as useless (even mine here). But most often that issue was more an issue of the past as of today minus some years.
A more lengthy answer similar to this one has been given in the somehow duplicate question:
So I think this question should highlight more of the benefits the suggested answer ships with.