Can str_replace be safely used on a UTF-8 encoded string if it's only given valid UTF-8 encoded strings as arguments?

前端未结

关注

 5  1507

孤街浪徒 2020-12-11 00:40

PHP\'s str_replace() was intended only for ANSI strings and as such can mangle UTF-8 strings. However, given that it\'s binary-safe would it work properly if it

5条回答

再見小時候 (楼主)

2020-12-11 01:01
It's correct because UTF-8 multibyte characters are exclusively non-ASCII (128+ byte value) characters beginning with a byte that defines how many bytes follow, so you can't accidentally end up matching a part of one UTF-8 multibyte character with another.

To visualise (abstractly):
- a for an ASCII character
- 2x for a 2-byte character
- 3xx for a 3-byte character
- 4xxx for a 4-byte character
If you're matching, say, a2x3xx (a bytes in ASCII range), since a < x, and 2x cannot be a subset of 3xx or 4xxx, et cetera, you can be safe that your UTF-8 will match correctly, given the prerequisite that all strings are definitely valid UTF-8.

Edit: See bobince's answer for a less abstract explanation.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...