Regex to detect Invalid UTF-8 String

前端未结

关注

 4  1413

挽巷 2020-11-29 02:13

In PHP, we can use mb_check_encoding() to determine if a string is valid UTF-8. But that\'s not a portable solution as it requires the mbstring extension to be compiled in a

4条回答

醉梦人生 (楼主)

2020-11-29 02:46
I put this in here for completeness:

Assuming PHP is compiled with PCRE it most often is also enabled with UTF-8. So as explicitly asked for in the question this very simple regular expression can detect invalid UTF-8 strings, because those won't match:
```
preg_match('//u', $string);
```
You can then argument that the u modifier (PCRE_UTF8) is not always available, and true, this can happen as the this question shows:
- What is the preg_match_all u flag dependent on?
However in my practical developer live this never was an issue. It is more an issue that the PCRE extension is not available at all, which would render any answer containing pcre as useless (even mine here). But most often that issue was more an issue of the past as of today minus some years.

A more lengthy answer similar to this one has been given in the somehow duplicate question:
- How to detect malformed utf-8 string in PHP?
So I think this question should highlight more of the benefits the suggested answer ships with.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...