Regex to detect Invalid UTF-8 String

前端 未结 4 1413
挽巷
挽巷 2020-11-29 02:13

In PHP, we can use mb_check_encoding() to determine if a string is valid UTF-8. But that\'s not a portable solution as it requires the mbstring extension to be compiled in a

4条回答
  •  醉梦人生
    2020-11-29 02:46

    I put this in here for completeness:

    Assuming PHP is compiled with PCRE it most often is also enabled with UTF-8. So as explicitly asked for in the question this very simple regular expression can detect invalid UTF-8 strings, because those won't match:

    preg_match('//u', $string);
    

    You can then argument that the u modifier (PCRE_UTF8) is not always available, and true, this can happen as the this question shows:

    • What is the preg_match_all u flag dependent on?

    However in my practical developer live this never was an issue. It is more an issue that the PCRE extension is not available at all, which would render any answer containing pcre as useless (even mine here). But most often that issue was more an issue of the past as of today minus some years.

    A more lengthy answer similar to this one has been given in the somehow duplicate question:

    • How to detect malformed utf-8 string in PHP?

    So I think this question should highlight more of the benefits the suggested answer ships with.

提交回复
热议问题